I ran the Harvester but the output does not contain any Clumpp pop files.

Check to make sure that your Structure results files contain a table titled "Proportion of membership of each pre-defined population in each of the X clusters" where X is the K value for that run. This table is needed to create the pop files. It will be missing if you do not define any populations a priori, i.e. be sure to check the box "Putative population origin for each individual."

Special thanks to Tom Devitt for both asking and answering this question!

I'm getting an error, `The file you selected is too large!'

How big is your .zip archive? The Harvester will not accept archives over 50 MB. If you have a large archive, you might think about using the stand alone version (available here) which has no file size limitations.

How do I cite the harvester?

Earl, Dent A. and vonHoldt, Bridgett M. (2012)
STRUCTURE HARVESTER: a website and program for visualizing
STRUCTURE output and implementing the Evanno method.
Conservation Genetics Resources vol. 4 (2) pp. 359-361. doi: 10.1007/s12686-011-9548-7


I think you're calculating Evanno's delta K wrong.

That's not actually a question, it's more of a statement, really.

Following an email exchange with Evanno et al. 2005's corresponding author, Dr. Jerome Goudet in March of 2011, I changed the way the Harvester performs the Evanno method. The new method (v0.6 onward) does not produce standard deviation measurements for the Ln'(K) and |Ln''(K)| values. The old method did and lead to slightly different values of delta K, though I did not see any large changes in data sets run using both the new and old methods.

But what about Evanno's Figure 2 parts B and C?

I emailed the corresponding author on the paper, Dr. Jerome Goudet, and he relayed to me that there should not have been standard deviation bars on those figures. Here's what Dr. Goudet said, starting with an algorithm to calculate delta K:

1/ average the L(K) over the x (say 20) replicates
2/ estimate from these averages L''(K) as abs( L(K+1) - 2L(K) + L(K-1) )
3/ divide by the standard deviation of L(K) (sd of the different replicates for the same K)

As for our paper, the sd for L'(K) and L''(K) were calculated "wrongly",
to give an idea of the variation at each step.  But this should not be done.

Show me the code.

The relevant function from the Python code is shown below. A standalone version of the code is available here: structureHarvester.

You can inspect the core functions from your browser on the github project repository.

def calculatePrimesDoublePrimesDeltaK( data ):
   """ This function takes in the data object and uses the
   estimated log probability means dictionary (data.estLnProbMeans) and the estimated
   log probability standard deviations dictionary (data.estLnProbStdevs) to
   calculate dictionaries keyed on K values (ints) for the three Evanno quantities of
   L'(K) : data.LnPK
   L''(K) : data.LnPPK
   delta K : data.deltaK
   Note that to calculate the deltaK for 'thisK' you need estimated log prob mean
   values for both the previous K, 'prevK' and the next K, 'nextK'. So if you run
   Structure for K = 1..20, you'll only get delta K for K = 2..19.
   data.LnPK = {}
   data.LnPPK = {}
   data.deltaK = {}
   for i in xrange( 1, len( data.sortedKs )):
      prevK = data.sortedKs[ i - 1 ]
      thisK = data.sortedKs[ i ]
      data.LnPK[ thisK ] = data.estLnProbMeans[ thisK ] - data.estLnProbMeans[ prevK ]
   for i in xrange( 1, len( data.sortedKs ) - 1 ):
      prevK = data.sortedKs[ i - 1 ]
      thisK = data.sortedKs[ i ]
      nextK = data.sortedKs[ i + 1 ]
      data.LnPPK[ thisK ] = abs( data.LnPK[ nextK ] - data.LnPK[ thisK ] )
      # data.deltaK[ thisK ] = data.LnPPK[ thisK ] / float( data.estLnProbStdevs[ thisK ] )
      data.deltaK[ thisK ] = abs( data.estLnProbMeans[ nextK ] -
                                  2.0 * data.estLnProbMeans[ thisK] +
                                  data.estLnProbMeans[ prevK ] ) / float( data.estLnProbStdevs[ thisK ] )

© Dent Earl 2007-2014