The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads

TLDR

Rsubread extends the Subread suite with R integration, producing read‑count matrices directly as R objects for downstream Bioconductor analysis. The authors introduce Rsubread as a Bioconductor package that delivers high‑performance alignment and read‑counting for RNA‑seq data. Rsubread integrates read mapping and quantification in a single, R‑only package, detects exon‑exon junctions de novo, quantifies expression at gene, exon, or junction levels, and was benchmarked against TopHat2, STAR, HTSeq, and other Bioconductor tools using SEQC data and simulations. Benchmarking shows Rsubread is faster, uses less memory, and yields read‑count summaries that correlate more accurately with true values than competitor tools.

Abstract

We present Rsubread, a Bioconductor software package that provides high-performance alignment and read counting functions for RNA-seq reads. Rsubread is based on the successful Subread suite with the added ease-of-use of the R programming environment, creating a matrix of read counts directly as an R object ready for downstream analysis. It integrates read mapping and quantification in a single package and has no software dependencies other than R itself. We demonstrate Rsubread's ability to detect exon-exon junctions de novo and to quantify expression at the level of either genes, exons or exon junctions. The resulting read counts can be input directly into a wide range of downstream statistical analyses using other Bioconductor packages. Using SEQC data and simulations, we compare Rsubread to TopHat2, STAR and HTSeq as well as to counting functions in the Bioconductor infrastructure packages. We consider the performance of these tools on the combined quantification task starting from raw sequence reads through to summary counts, and in particular evaluate the performance of different combinations of alignment and counting algorithms. We show that Rsubread is faster and uses less memory than competitor tools and produces read count summaries that more accurately correlate with true values.

References

Page 1

	Year	Citations

Page 1