Concepedia

Publication | Open Access

FLASH: fast length adjustment of short reads to improve genome assemblies

15.1K

Citations

10

References

2011

Year

TLDR

Next‑generation sequencing produces vast numbers of short reads, yet their limited length hampers de novo genome assembly even with deep coverage. The authors present FLASH, a fast tool that extends short reads by overlapping paired‑end reads from short fragment libraries. FLASH overlaps and merges paired‑end reads from short fragment libraries, and its correctness was validated on one million simulated pairs and used as a pre‑processor for Illumina assemblies of *Staphylococcus aureus* and human chromosome 14. FLASH achieves >99 % accuracy on simulated reads with <1 % error, >90 % accuracy with up to 5 % error, and improves N50 lengths of assemblies when used as a pre‑processor. FLASH is open‑source C code available at http://www.cbcb.umd.edu/software/flash; contact t.magoc@gmail.com.

Abstract

Abstract Motivation: Next-generation sequencing technologies generate very large numbers of short reads. Even with very deep genome coverage, short read lengths cause problems in de novo assemblies. The use of paired-end libraries with a fragment size shorter than twice the read length provides an opportunity to generate much longer reads by overlapping and merging read pairs before assembling a genome. Results: We present FLASH, a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short. We tested the correctness of the tool on one million simulated read pairs, and we then applied it as a pre-processor for genome assemblies of Illumina reads from the bacterium Staphylococcus aureus and human chromosome 14. FLASH correctly extended and merged reads &amp;gt;99% of the time on simulated reads with an error rate of &amp;lt;1%. With adequately set parameters, FLASH correctly merged reads over 90% of the time even when the reads contained up to 5% errors. When FLASH was used to extend reads prior to assembly, the resulting assemblies had substantially greater N50 lengths for both contigs and scaffolds. Availability and Implementation: The FLASH system is implemented in C and is freely available as open-source code at http://www.cbcb.umd.edu/software/flash. Contact: t.magoc@gmail.com

References

YearCitations

Page 1