Publication | Open Access
Ten common issues with reference sequence databases and how to mitigate them
55
Citations
83
References
2024
Year
EngineeringGeneticsReference Sequence DatabasesGenomicsSequence AlignmentMetagenomic ClassificationBioinformatics DatabaseHigh Throughput SequencingDatabase SystemData ScienceMolecular EcologyReference DataData IntegrationData ManagementMetagenomic SequencingMetagenomic ToolsSequence AnalysisOmicsComputer ScienceMicrobiomeDatabase TechnologyBioinformaticsSequencingBiologyCommon IssuesNext-generation SequencingBiological DatabasesGenome SequencingMicrobiologyMedicineData Modeling
Metagenomic sequencing has revolutionized our understanding of microbiology. While metagenomic tools and approaches have been extensively evaluated and benchmarked, far less attention has been given to the reference sequence database used in metagenomic classification. Issues with reference sequence databases are pervasive. Database contamination is the most recognized issue in the literature; however, it remains relatively unmitigated in most analyses. Other common issues with reference sequence databases include taxonomic errors, inappropriate inclusion and exclusion criteria, and sequence content errors. This review covers ten common issues with reference sequence databases and the potential downstream consequences of these issues. Mitigation measures are discussed for each issue, including bioinformatic tools and database curation strategies. Together, these strategies present a path towards more accurate, reproducible and translatable metagenomic sequencing.
| Year | Citations | |
|---|---|---|
Page 1
Page 1