Simultaneous Discovery, Estimation and Prediction Analysis of Complex Traits Using a Bayesian Mixture Model

TLDR

Gene discovery, heritability estimation, genetic architecture inference, and prediction of complex traits are typically conducted with separate statistical models, reducing efficiency and power. The study introduces a Bayesian mixture model that simultaneously performs variant discovery, estimates total genetic variance, and predicts phenotypes in new samples. The model was applied to simulated quantitative traits and WTCCC disease data, yielding accurate SNP‑based heritability estimates, unbiased risk predictions, and the ability to partition genetic variation across hundreds to thousands of SNPs. Across WTCCC diseases, 2,633–9,411 SNPs explained all SNP‑based heritability, with >96% having small effects, large‑effect contributions ranging from near zero in bipolar disorder to 72% in type 1 diabetes, and Bayesian predictions outperforming profile scoring or mixed‑model methods for diseases with major loci such as type 1 diabetes and rheumatoid arthritis.

Abstract

Gene discovery, estimation of heritability captured by SNP arrays, inference on genetic architecture and prediction analyses of complex traits are usually performed using different statistical models and methods, leading to inefficiency and loss of power. Here we use a Bayesian mixture model that simultaneously allows variant discovery, estimation of genetic variance explained by all variants and prediction of unobserved phenotypes in new samples. We apply the method to simulated data of quantitative traits and Welcome Trust Case Control Consortium (WTCCC) data on disease and show that it provides accurate estimates of SNP-based heritability, produces unbiased estimators of risk in new samples, and that it can estimate genetic architecture by partitioning variation across hundreds to thousands of SNPs. We estimated that, depending on the trait, 2,633 to 9,411 SNPs explain all of the SNP-based heritability in the WTCCC diseases. The majority of those SNPs (>96%) had small effects, confirming a substantial polygenic component to common diseases. The proportion of the SNP-based variance explained by large effects (each SNP explaining 1% of the variance) varied markedly between diseases, ranging from almost zero for bipolar disorder to 72% for type 1 diabetes. Prediction analyses demonstrate that for diseases with major loci, such as type 1 diabetes and rheumatoid arthritis, Bayesian methods outperform profile scoring or mixed model approaches.

References

Page 1

	Year	Citations

Page 1