Publication | Open Access
Population Structure and Eigenanalysis
5.5K
Citations
42
References
2006
Year
Current methods for inferring population structure from genetic data lack formal significance tests, whereas principal components analysis—first applied to genetic data by Cavalli‑Sforza et al.—offers a promising alternative. The study aims to establish a solid statistical foundation for principal components analysis in population genetics by developing formal significance tests. The authors apply modern statistical theory to derive formal significance tests for principal components analysis. They discover a phase‑change phenomenon whereby, for a fixed large dataset, population divergence below a threshold (e.g., FST) is essentially undetectable, while just above the threshold detection becomes easy, enabling prediction of the dataset size required to detect structure.
Current methods for inferring population structure from genetic data do not provide formal significance tests for population differentiation. We discuss an approach to studying population structure (principal components analysis) that was first applied to genetic data by Cavalli-Sforza and colleagues. We place the method on a solid statistical footing, using results from modern statistics to develop formal significance tests. We also uncover a general "phase change" phenomenon about the ability to detect structure in genetic data, which emerges from the statistical theory we use, and has an important implication for the ability to discover structure in genetic data: for a fixed but large dataset size, divergence between two populations (as measured, for example, by a statistic like FST) below a threshold is essentially undetectable, but a little above threshold, detection will be easy. This means that we can predict the dataset size needed to detect structure.
| Year | Citations | |
|---|---|---|
Page 1
Page 1