Publication | Open Access
Privacy-Preserving Data Sharing for Genome-Wide Association Studies
125
Citations
23
References
2013
Year
Privacy ProtectionEngineeringPrivacy-preserving TechniquesGeneticsPrivacy-preserving Data SharingGenomicsGenome-wide Association StudiesData ScienceData AnonymizationStatistical ComputingNew MethodsBiostatisticsData SharingData ManagementStatisticsTraditional Statistical MethodsConfidentiality ProtectionQuantitative GeneticsData PrivacyStatistical GeneticsDifferential PrivacyPrivacyData SecurityPrivacy PreservationStatistical InferenceMedicine
Traditional confidentiality methods fail to scale to GWAS databases and protect against linkage to external data, while differential privacy offers rigorous guarantees at the cost of utility. The authors aim to develop aggregate GWAS release methods that preserve individual privacy. They introduce differentially private techniques for minor allele frequencies, chi‑square statistics, and p‑values, evaluate them on simulated data and a canine hair‑length GWAS, and propose a privacy‑preserving penalized logistic regression for genome‑wide association discovery.
Traditional statistical methods for confidentiality protection of statistical databases do not scale well to deal with GWAS databases especially in terms of guarantees regarding protection from linkage to external information. The more recent concept of differential privacy, introduced by the cryptographic community, is an approach which provides a rigorous definition of privacy with meaningful privacy guarantees in the presence of arbitrary external information, although the guarantees may come at a serious price in terms of data utility. Building on such notions, we propose new methods to release aggregate GWAS data without compromising an individual’s privacy. We present methods for releasing differentially private minor allele frequencies, chi-square statistics and p-values. We compare these approaches on simulated data and on a GWAS study of canine hair length involving 685 dogs. We also propose a privacy-preserving method for finding genome-wide associations based on a differentially-private approach to penalized logistic regression.
| Year | Citations | |
|---|---|---|
Page 1
Page 1