MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct <i>de Bruijn</i> graph

TLDR

MEGAHIT is a de novo assembler designed to rapidly and cost‑efficiently assemble large, complex metagenomic datasets. MEGAHIT assembles data directly without pre‑processing such as partitioning or normalization. On a 252‑Gb soil metagenomic dataset, MEGAHIT assembled the data in 44.1 h on a single node with a GPU (99.6 h without), producing an assembly three times larger, with longer contig N50 and average length, and aligning 55.8 % of reads—four times better than prior methods.

Abstract

MEGAHIT is a NGS de novo assembler for assembling large and complex metagenomics data in a time- and cost-efficient manner. It finished assembling a soil metagenomics dataset with 252 Gbps in 44.1 and 99.6 h on a single computing node with and without a graphics processing unit, respectively. MEGAHIT assembles the data as a whole, i.e. no pre-processing like partitioning and normalization was needed. When compared with previous methods on assembling the soil data, MEGAHIT generated a three-time larger assembly, with longer contig N50 and average contig length; furthermore, 55.8% of the reads were aligned to the assembly, giving a fourfold improvement.

References

Page 1

	Year	Citations
Fast gapped-read alignment with Bowtie 2 Ben Langmead, Steven L. Salzberg Nature Methods Long-read SequencingSequence AssemblyNatural SciencesComputational BiologyMolecular Biology	2012	58.3K
SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing Anton Bankevich, Sergey Nurk, Dmitry Antipov, Journal of Computational Biology	2012	25.9K
A human gut microbial gene catalogue established by metagenomic sequencing Junjie Qin, Ruiqiang Li, Jeroen Raes, Nature	2010	11.4K
QUAST: quality assessment tool for genome assemblies Alexey Gurevich, Vladislav Saveliev, Nikolay Vyahhi, Bioinformatics	2013	10.6K
SOAPdenovo2: an empirically improved memory-efficient short-read <i>de novo</i> assembler Ruibang Luo, Binghang Liu, Yinlong Xie, GigaScience	2012	5.5K
IDBA-UD: a <i>de novo</i> assembler for single-cell and metagenomic sequencing data with highly uneven depth Yu Peng, Henry C. M. Leung, Siu‐Ming Yiu, Bioinformatics	2012	3.1K
Tackling soil diversity with the assembly of large, complex metagenomes Adina Chuang Howe, Janet Jansson, Stephanie Malfatti, Proceedings of the National Academy of Sciences	2014	344
GPU-Accelerated BWT Construction for Large Collection of Short Reads Chi-Man Liu, Ruibang Luo, Tak‐Wah Lam arXiv (Cornell University) Cluster ComputingEngineeringGeneticsComputer ArchitectureMolecular Biology	2014	28

Page 1