Mendelian Randomization Analysis With Multiple Genetic Variants Using Summarized Data

TLDR

Genome‑wide association studies, which typically report regression coefficients summarizing the associations of many genetic variants with various traits, are a powerful source of data for Mendelian randomization investigations. The study demonstrates how to combine coefficients from multiple variants in Mendelian randomization to estimate the causal effect of a risk factor on an outcome. The authors compare bias and efficiency of summarized‑data Mendelian randomization to individual‑level data through simulations, examine effects of gene‑gene interactions, linkage disequilibrium, and weak instruments, and apply the methods to estimate the causal effect of LDL‑C on coronary artery disease using published data on five variants. Both inverse‑variance weighted and likelihood‑based summarized‑data approaches produce estimates and precision comparable to two‑stage least squares, though they overstate precision when variants are in linkage disequilibrium; applying the methods to LDL‑C and coronary artery disease shows a 30% LDL‑C reduction lowers CAD risk by 67% (95% CI 54–76%), and the authors conclude that summarized‑data Mendelian randomization with uncorrelated variants is similarly efficient to individual‑level data, albeit with limited ability to assess assumptions.

Abstract

Genome-wide association studies, which typically report regression coefficients summarizing the associations of many genetic variants with various traits, are potentially a powerful source of data for Mendelian randomization investigations. We demonstrate how such coefficients from multiple variants can be combined in a Mendelian randomization analysis to estimate the causal effect of a risk factor on an outcome. The bias and efficiency of estimates based on summarized data are compared to those based on individual-level data in simulation studies. We investigate the impact of gene-gene interactions, linkage disequilibrium, and 'weak instruments' on these estimates. Both an inverse-variance weighted average of variant-specific associations and a likelihood-based approach for summarized data give similar estimates and precision to the two-stage least squares method for individual-level data, even when there are gene-gene interactions. However, these summarized data methods overstate precision when variants are in linkage disequilibrium. If the P-value in a linear regression of the risk factor for each variant is less than 1×10⁻⁵, then weak instrument bias will be small. We use these methods to estimate the causal association of low-density lipoprotein cholesterol (LDL-C) on coronary artery disease using published data on five genetic variants. A 30% reduction in LDL-C is estimated to reduce coronary artery disease risk by 67% (95% CI: 54% to 76%). We conclude that Mendelian randomization investigations using summarized data from uncorrelated variants are similarly efficient to those using individual-level data, although the necessary assumptions cannot be so fully assessed.

References

Page 1

	Year	Citations

Page 1