Publication | Open Access
The variant call format provides efficient and robust storage of GWAS summary statistics
33
Citations
37
References
2020
Year
Unknown Venue
Variant Call FormatGeneticsGenomicsHigh Throughput SequencingGenome-wide Association StudiesClinical GeneticsGenome-wide Association StudySummary StatisticsRobust StorageGwas DataHuman PhenotypesComputational GenomicsStatistical ComputingBiostatisticsGwas Summary StatisticsWhole Genome StudiesPublic HealthMolecular DiagnosticsStatisticsVariant InterpretationPersonal GenomicsStatistical GeneticsPopulation GeneticsSequencingBioinformaticsEpidemiologyNext-generation SequencingMedicine
Genome-wide association study (GWAS) summary statistics are a fundamental resource for a variety of research applications 1–6 . Yet despite their widespread utility, no common storage format has been widely adopted, hindering tool development and data sharing, analysis and integration. Existing tabular formats 7,8 often ambiguously or incompletely store information about genetic variants and their associations, and also lack essential metadata increasing the possibility of errors in data interpretation and post-GWAS analyses. Additionally, data in these formats are typically not indexed, requiring the whole file to be read which is computationally inefficient. To address these issues, we propose an adaptation of the variant call format 9 (GWAS-VCF) and have produced a suite of open-source tools for using this format in downstream analyses. Simulation studies determine GWAS-VCF is 9-46x faster than tabular alternatives when extracting variant(s) by genomic position. Our results demonstrate the GWAS-VCF provides a robust and performant solution for sharing, analysis and integration of GWAS data. We provide open access to over 10,000 complete GWAS summary datasets converted to this format (available from: https://gwas.mrcieu.ac.uk ).
| Year | Citations | |
|---|---|---|
Page 1
Page 1