Beyond Missing Heritability: Prediction of Complex Traits

TLDR

Despite rapid advances in genomic technology, our ability to account for phenotypic variation using genetic information remains limited for many traits, hindering the application of genetic data to preventive and personalized medicine. However, it is currently unclear how gains in explained genetic variance will translate to the prediction of yet‑to‑be observed phenotypes. Using data from the Framingham Heart Study, we explore the genomic prediction of human height in training and validation samples while varying the statistical approach, the number of SNPs included, the validation scheme, and the number of training subjects. In training datasets we explain up to 83 % of height variance (h²) and R² up to 0.96, but in validation samples the explained variance drops to 15–36 % depending on familial information, leaving substantial room for improvement despite exceeding previous reports (<10 %).

Abstract

Despite rapid advances in genomic technology, our ability to account for phenotypic variation using genetic information remains limited for many traits. This has unfortunately resulted in limited application of genetic data towards preventive and personalized medicine, one of the primary impetuses of genome-wide association studies. Recently, a large proportion of the "missing heritability" for human height was statistically explained by modeling thousands of single nucleotide polymorphisms concurrently. However, it is currently unclear how gains in explained genetic variance will translate to the prediction of yet-to-be observed phenotypes. Using data from the Framingham Heart Study, we explore the genomic prediction of human height in training and validation samples while varying the statistical approach used, the number of SNPs included in the model, the validation scheme, and the number of subjects used to train the model. In our training datasets, we are able to explain a large proportion of the variation in height (h2 up to 0.83, R2 up to 0.96). However, the proportion of variance accounted for in validation samples is much smaller (ranging from 0.15 to 0.36 depending on the degree of familial information used in the training dataset). While such R2 values vastly exceed what has been previously reported using a reduced number of pre-selected markers (<0.10), given the heritability of the trait (∼0.80), substantial room for improvement remains.

References

Page 1

	Year	Citations

Page 1