Publication | Open Access
Mapping short DNA sequencing reads and calling variants using mapping quality scores
2.7K
Citations
26
References
2008
Year
Short‑read sequencing technologies generate very short reads that challenge accurate alignment to reference genomes, necessitating new algorithms and software. The authors propose mapping quality as a confidence metric indicating whether a read truly originates from its aligned position. MAQ implements this metric by leveraging mate‑pair information and a Bayesian model that combines mapping quality, raw sequence error probabilities, haplotype sampling, and empirical error correlations to produce genotype calls. MAQ demonstrates accuracy, efficiency, versatility, and user‑friendliness, and is freely available online.
New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality , a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net .
| Year | Citations | |
|---|---|---|
Page 1
Page 1