Concepedia

Publication | Open Access

TopHat: discovering splice junctions with RNA-Seq

12K

Citations

23

References

2009

Year

TLDR

RNA‑Seq produces millions of short reads that can quantify gene expression and reveal novel splice variants, but existing aligners depend on known junctions and cannot discover new ones. TopHat aims to align RNA‑Seq reads to a reference genome without pre‑known splice sites, addressing challenges of ab initio splice‑site discovery. TopHat maps reads to the genome and detects splice junctions de novo, bypassing the need for annotated splice sites. In a mammalian study, TopHat recovered over 72 % of annotated junctions and identified nearly 20 000 novel ones, while processing 2.2 million reads per CPU hour—fast enough to finish an experiment in under a day on a desktop. TopHat is free, open‑source software available at http://tophat.cbcb.umd.edu, with contact cole@cs.umd.edu and supplementary data online.

Abstract

Abstract Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or ‘reads’, can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites. Results: We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20 000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development. Availability: TopHat is free, open-source software available from http://tophat.cbcb.umd.edu Contact: cole@cs.umd.edu Supplementary information: Supplementary data are available at Bioinformatics online.

References

YearCitations

2009

22.5K

2008

14K

2008

9.6K

2002

8.3K

2008

5.2K

2008

2.8K

2008

2.7K

2005

2.6K

1994

2.4K

2002

1.5K

Page 1