Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods

TLDR

Prediction of genetic values is a central problem in quantitative genetics, traditionally addressed with phenotypic records and pedigree information, but the advent of dense molecular markers introduces statistical and computational challenges that RKHS regression can help mitigate. This article reviews RKHS methodology, discusses kernel selection for genetic applications, presents algorithms for kernel choice, and evaluates the approach on 599 wheat lines across four mega environments. RKHS regression can be applied to virtually any type of prediction set—including covariates, graphs, strings, and images—and offers computational advantages over many parametric models. The proposed method demonstrates computational advantages relative to many parametric approaches.

Abstract

Prediction of genetic values is a central problem in quantitative genetics. Over many decades, such predictions have been successfully accomplished using information on phenotypic records and family structure usually represented with a pedigree. Dense molecular markers are now available in the genome of humans, plants and animals, and this information can be used to enhance the prediction of genetic values. However, the incorporation of dense molecular marker data into models poses many statistical and computational challenges, such as how models can cope with the genetic complexity of multi-factorial traits and with the curse of dimensionality that arises when the number of markers exceeds the number of data points. Reproducing kernel Hilbert spaces regressions can be used to address some of these challenges. The methodology allows regressions on almost any type of prediction sets (covariates, graphs, strings, images, etc.) and has important computational advantages relative to many parametric approaches. Moreover, some parametric models appear as special cases. This article provides an overview of the methodology, a discussion of the problem of kernel choice with a focus on genetic applications, algorithms for kernel selection and an assessment of the proposed methods using a collection of 599 wheat lines evaluated for grain yield in four mega environments.

References

Page 1

	Year	Citations

Page 1