Power and Sample Size Calculations for Case-Control Genetic Association Tests when Errors Are Present: Application to Single Nucleotide Polymorphisms

TLDR

The study aims to quantify how genotyping errors affect statistical power and the sample size needed to maintain fixed Type I and Type II error rates in case‑control genetic association studies of SNPs. The authors model three published genotyping‑error mechanisms, derive genotype frequencies conditioned on disease status under both model‑based and model‑free frameworks, and compute the asymptotic power via the non‑centrality parameter to relate sample size to error rates, linkage disequilibrium, allele frequencies, and a dominant disease model. They find that higher genotyping error increases required sample size, with each 1 % rise in error rate demanding a 2–8 % larger cohort, and that for a dominant model sample size depends nonlinearly on linkage disequilibrium and error rate, being greatest when both LD is low and error is high.

Abstract

The purpose of this work is to quantify the effects that errors in genotyping have on power and the sample size necessary to maintain constant asymptotic Type I and Type II error rates (SSN) for case-control genetic association studies between a disease phenotype and a di-allelic marker locus, for example a single nucleotide polymorphism (SNP) locus. We consider the effects of three published models of genotyping errors on the chi-square test for independence in the 2 × 3 table. After specifying genotype frequencies for the marker locus conditional on disease status and error model in both a genetic model-based and a genetic model-free framework, we compute the asymptotic power to detect association through specification of the test’s non-centrality parameter. This parameter determines the functional dependence of SSN on the genotyping error rates. Additionally, we study the dependence of SSN on linkage disequilibrium (LD), marker allele frequencies, and genotyping error rates for a dominant disease model. Increased genotyping error rate requires a larger SSN. Every 1% increase in sum of genotyping error rates requires that both case and control SSN be increased by 2–8%, with the extent of increase dependent upon the error model. For the dominant disease model, SSN is a nonlinear function of LD and genotyping error rate, with greater SSN for lower LD and higher genotyping error rate. The combination of lower LD and higher genotyping error rates requires a larger SSN than the sum of the SSN for the lower LD and for the higher genotyping error rate.

References

Page 1

	Year	Citations

Page 1