Concepedia

Publication | Open Access

MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct <i>de Bruijn</i> graph

9K

Citations

8

References

2015

Year

TLDR

MEGAHIT is a de novo assembler designed to rapidly and cost‑efficiently assemble large, complex metagenomic datasets. MEGAHIT assembles data directly without pre‑processing such as partitioning or normalization. On a 252‑Gb soil metagenomic dataset, MEGAHIT assembled the data in 44.1 h on a single node with a GPU (99.6 h without), producing an assembly three times larger, with longer contig N50 and average length, and aligning 55.8 % of reads—four times better than prior methods.

Abstract

MEGAHIT is a NGS de novo assembler for assembling large and complex metagenomics data in a time- and cost-efficient manner. It finished assembling a soil metagenomics dataset with 252 Gbps in 44.1 and 99.6 h on a single computing node with and without a graphics processing unit, respectively. MEGAHIT assembles the data as a whole, i.e. no pre-processing like partitioning and normalization was needed. When compared with previous methods on assembling the soil data, MEGAHIT generated a three-time larger assembly, with longer contig N50 and average contig length; furthermore, 55.8% of the reads were aligned to the assembly, giving a fourfold improvement.