Publication | Open Access
Creating a universal SNP and small indel variant caller with deep neural networks
93
Citations
32
References
2016
Year
Unknown Venue
EngineeringMachine LearningDeep Whole GenomesGeneticsGenomicsGenetic MedicineData ScienceComputational GenomicsGround-truth GenotypeMolecular DiagnosticsUniversal SnpComputer ScienceDeep LearningSequencingBioinformaticsDeep Neural NetworksNext-generation SequencingComputational BiologyAbstract Next-generation SequencingMedicine
Abstract Next-generation sequencing (NGS) is a rapidly evolving set of technologies that can be used to determine the sequence of an individual’s genome 1 by calling genetic variants present in an individual using billions of short, errorful sequence reads 2 . Despite more than a decade of effort and thousands of dedicated researchers, the hand-crafted and parameterized statistical models used for variant calling still produce thousands of errors and missed variants in each genome 3,4 . Here we show that a deep convolutional neural network 5 can call genetic variation in aligned next-generation sequencing read data by learning statistical relationships (likelihoods) between images of read pileups around putative variant sites and ground-truth genotype calls. This approach, called DeepVariant, outperforms existing tools, even winning the “highest performance” award for SNPs in a FDA-administered variant calling challenge. The learned model generalizes across genome builds and even to other mammalian species, allowing non-human sequencing projects to benefit from the wealth of human ground truth data. We further show that, unlike existing tools which perform well on only a specific technology, DeepVariant can learn to call variants in a variety of sequencing technologies and experimental designs, from deep whole genomes from 10X Genomics to Ion Ampliseq exomes. DeepVariant represents a significant step from expert-driven statistical modeling towards more automatic deep learning approaches for developing software to interpret biological instrumentation data.
| Year | Citations | |
|---|---|---|
Page 1
Page 1