Concepedia

TLDR

Genomics enables comprehensive genome surveying, yet assessing completeness is difficult amid rapidly evolving technologies and large data volumes, and BUSCO quantifies expected gene content to address this challenge. This work introduces BUSCO v3 and showcases its broad utility beyond quality control, including comparative genomics, gene predictor training, metagenomics, and phylogenomics. BUSCO v3 features a complete code refactor for flexibility and high‑throughput assessment, along with expanded lineage datasets—34 new subsets across vertebrates, arthropods, fungi, prokaryotes, nematodes, protists, and plants—to improve resolution. Example analyses confirm that BUSCO assessments effectively support comparative genomics, gene predictor training, metagenomics, and phylogenomics, extending well beyond data quality control.

Abstract

Genomics promises comprehensive surveying of genomes and metagenomes, but rapidly changing technologies and expanding data volumes make evaluation of completeness a challenging task. Technical sequencing quality metrics can be complemented by quantifying completeness of genomic data sets in terms of the expected gene content of Benchmarking Universal Single-Copy Orthologs (BUSCO, http://busco.ezlab.org). The latest software release implements a complete refactoring of the code to make it more flexible and extendable to facilitate high-throughput assessments. The original six lineage assessment data sets have been updated with improved species sampling, 34 new subsets have been built for vertebrates, arthropods, fungi, and prokaryotes that greatly enhance resolution, and data sets are now also available for nematodes, protists, and plants. Here, we present BUSCO v3 with example analyses that highlight the wide-ranging utility of BUSCO assessments, which extend beyond quality control of genomics data sets to applications in comparative genomics analyses, gene predictor training, metagenomics, and phylogenomics.

References

YearCitations

Page 1