Inferring weak population structure with the assistance of sample group information

TLDR

Abstract

Genetic clustering algorithms require a certain amount of data to produce informative results. In the common situation that individuals are sampled at several locations, we show how sample group information can be used to achieve better results when the amount of data is limited. New models are developed for the structure program, both for the cases of admixture and no admixture. These models work by modifying the prior distribution for each individual's population assignment. The new prior distributions allow the proportion of individuals assigned to a particular cluster to vary by location. The models are tested on simulated data, and illustrated using microsatellite data from the CEPH Human Genome Diversity Panel. We demonstrate that the new models allow structure to be detected at lower levels of divergence, or with less data, than the original structure models or principal components methods, and that they are not biased towards detecting structure when it is not present. These models are implemented in a new version of structure which is freely available online at http://pritch.bsd.uchicago.edu/structure.html.

References

Page 1

	Year	Citations
Inference of Population Structure Using Multilocus Genotype Data Jonathan K. Pritchard, Matthew Stephens, Peter Donnelly Genetics	2000	33.7K
Detecting the number of clusters of individuals using the software <scp>structure</scp>: a simulation study Guillaume Evanno, Sébastien Regnaut, Jérôme Goudet Molecular Ecology	2005	21.6K
Algorithm AS 136: A K-Means Clustering Algorithm J. A. Hartigan, M. Anthony Wong Journal of the Royal Statistical Society Series C (Applied Statistics) Document ClusteringEngineeringData ScienceData MiningPattern Recognition	1979	14.2K
Inference of Population Structure Using Multilocus Genotype Data: Linked Loci and Correlated Allele Frequencies Daniel Falush, Matthew Stephens, Jonathan K. Pritchard Genetics	2003	8K
Population Structure and Eigenanalysis Nick Patterson, Alkes L. Price, David Reich PLoS Genetics	2006	5.5K
FAST‐TRACK: Integrating QTL mapping and genome scans towards the characterization of candidate loci under parallel selection in the lake whitefish (<i>Coregonus clupeaformis</i>) Sean M. Rogers, Louis Bernatchez Molecular Ecology	2004	5.1K
Inference of population structure using multilocus genotype data: dominant markers and null alleles Daniel Falush, Matthew Stephens, Jonathan K. Pritchard Molecular Ecology Notes	2007	3.5K
Genetic Structure of Human Populations Noah A. Rosenberg, Jonathan K. Pritchard, James L. Weber, Science Genome-wide Association StudyGenotype-phenotype AssociationHuman VariationMedicineGenetics	2002	3K
Bayesian Analysis of Genetic Differentiation Between Populations Jukka Corander, Patrik Waldmann, Mikko J. Sillanpää Genetics	2003	874
Bayesian identification of admixture events using multilocus molecular markers Jukka Corander, Pekka Marttinen Molecular Ecology	2006	645

Page 1