Publication | Open Access
TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data
4.3K
Citations
31
References
2015
Year
EngineeringDna MethylationPathologyBioinformatics DatabaseTumor BiologyResearch NetworkTcga DataBioanalysisBiostatisticsMolecular DiagnosticsCancer ResearchTranslational BioinformaticsBiological DatabaseBiomolecular AnalysisOmicsPathway AnalysisFunctional GenomicsBioinformaticsOmics DatasetsComputational BiologyCancer GenomicsMicrobiologySystems BiologyMedicine
TCGA has released a large public dataset of over 10,000 tumor patients across 33 cancer types, with more than 20 marker studies, yet mining this data remains difficult due to challenges in data retrieval and integration with clinical and other molecular data. TCGAbiolinks was created to overcome these challenges by providing a bioinformatics solution that enables users to query, download, and analyze TCGA data within a guided workflow. The package integrates computer‑science and statistical methods, incorporates techniques from prior TCGA marker studies, and supports integration of RNA, DNA methylation, and clinical data through a modular pipeline. Case studies on kidney, brain, breast, and colon cancers demonstrate reproducibility, integrative analysis, and the ability of TCGAbiolinks to accelerate novel discoveries using Bioconductor packages.
The Cancer Genome Atlas (TCGA) research network has made public a large collection of clinical and molecular phenotypes of more than 10 000 tumor patients across 33 different tumor types. Using this cohort, TCGA has published over 20 marker papers detailing the genomic and epigenomic alterations associated with these tumor types. Although many important discoveries have been made by TCGA's research network, opportunities still exist to implement novel methods, thereby elucidating new biological pathways and diagnostic markers. However, mining the TCGA data presents several bioinformatics challenges, such as data retrieval and integration with clinical data and other molecular data types (e.g. RNA and DNA methylation). We developed an R/Bioconductor package called TCGAbiolinks to address these challenges and offer bioinformatics solutions by using a guided workflow to allow users to query, download and perform integrative analyses of TCGA data. We combined methods from computer science and statistics into the pipeline and incorporated methodologies developed in previous TCGA marker studies and in our own group. Using four different TCGA tumor types (Kidney, Brain, Breast and Colon) as examples, we provide case studies to illustrate examples of reproducibility, integrative analysis and utilization of different Bioconductor packages to advance and accelerate novel discoveries.
| Year | Citations | |
|---|---|---|
Page 1
Page 1