Publication | Open Access
DNAGPT: A Generalized Pre-trained Tool for Multiple DNA Sequence Analysis Tasks
55
Citations
37
References
2023
Year
Unknown Venue
EngineeringGeneticsMolecular BiologyMrna Abundance RegressionGenomicsSequence AlignmentGene RecognitionClassic Gpt ModelHigh Throughput SequencingLanguage ProcessingNatural Language ProcessingGeneralized Pre-trained ToolData ScienceComputational GenomicsDnagpt ’Dna SequencingSequence ModellingSequence AnalysisPre-trained ModelsFunctional GenomicsBioinformaticsComputational BiologyMedicineGenome EditingSequence Assembly
Abstract Pre-trained large language models demonstrate potential in extracting information from DNA sequences, yet adapting to a variety of tasks and data modalities remains a challenge. To address this, we propose DNAGPT, a generalized DNA pre-training model trained on over 200 billion base pairs from all mammals. By enhancing the classic GPT model with a binary classification task (DNA sequence order), a numerical regression task (guanine-cytosine content prediction), and a comprehensive token language, DNAGPT can handle versatile DNA analysis tasks while processing both sequence and numerical data. Our evaluation of genomic signal and region recognition, mRNA abundance regression, and artificial genome generation tasks demonstrates DNAGPT’s superior performance compared to existing models designed for specific downstream tasks, benefiting from pre-training using the newly designed model structure.
| Year | Citations | |
|---|---|---|
Page 1
Page 1