Concepedia

Publication | Open Access

RSeQC: quality control of RNA-seq experiments

2.9K

Citations

7

References

2012

Year

TLDR

RNA‑seq is widely used for transcriptome studies, but quality control is essential yet time‑consuming and complex due to the data’s size and diversity. The authors aim to provide a convenient, comprehensive QC tool for RNA‑seq data. RSeQC is a Python/C package that accepts SAM/BAM and BED files and uses R scripts to visualize and assess multiple QC metrics—including sequence quality, GC bias, PCR bias, nucleotide composition bias, depth, strand specificity, coverage uniformity, and read distribution—efficiently handling large datasets. RSeQC is freely available in Python and C, with source code and a user manual at http://code.google.com/p/rseqc/, contact WL1@bcm.edu, and supplementary data are online.

Abstract

Abstract Motivation: RNA-seq has been extensively used for transcriptome study. Quality control (QC) is critical to ensure that RNA-seq data are of high quality and suitable for subsequent analyses. However, QC is a time-consuming and complex task, due to the massive size and versatile nature of RNA-seq data. Therefore, a convenient and comprehensive QC tool to assess RNA-seq quality is sorely needed. Results: We developed the RSeQC package to comprehensively evaluate different aspects of RNA-seq experiments, such as sequence quality, GC bias, polymerase chain reaction bias, nucleotide composition bias, sequencing depth, strand specificity, coverage uniformity and read distribution over the genome structure. RSeQC takes both SAM and BAM files as input, which can be produced by most RNA-seq mapping tools as well as BED files, which are widely used for gene models. Most modules in RSeQC take advantage of R scripts for visualization, and they are notably efficient in dealing with large BAM/SAM files containing hundreds of millions of alignments. Availability and implementation: RSeQC is written in Python and C. Source code and a comprehensive user's manual are freely available at: http://code.google.com/p/rseqc/. Contact: WL1@bcm.edu Supplementary Information: Supplementary data are available at Bioinformatics online.

References

YearCitations

Page 1