Stampy: A statistical algorithm for sensitive and fast mapping of Illumina sequence reads

TLDR

High‑volume DNA/RNA sequencing is now routine, yet mapping short reads to reference genomes remains difficult because of data volume, large genomes, and sequence variation, and existing tools trade speed for sensitivity. The authors introduce Stampy, a read mapper that combines a hybrid algorithm with a detailed statistical model to deliver both speed and sensitivity. Stampy employs a hybrid mapping strategy and a probabilistic model to handle sequence variation, especially indels. This approach yields higher usable sequence yield and improved accuracy compared to existing software.

Abstract

High-volume sequencing of DNA and RNA is now within reach of any research laboratory and is quickly becoming established as a key research tool. In many workflows, each of the short sequences (“reads”) resulting from a sequencing run are first “mapped” (aligned) to a reference sequence to infer the read from which the genomic location derived, a challenging task because of the high data volumes and often large genomes. Existing read mapping software excel in either speed (e.g., BWA, Bowtie, ELAND) or sensitivity (e.g., Novoalign), but not in both. In addition, performance often deteriorates in the presence of sequence variation, particularly so for short insertions and deletions (indels). Here, we present a read mapper, Stampy, which uses a hybrid mapping algorithm and a detailed statistical model to achieve both speed and sensitivity, particularly when reads include sequence variation. This results in a higher useable sequence yield and improved accuracy compared to that of existing software.

References

Page 1

	Year	Citations

Page 1