Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses

TLDR

High‑throughput sequencing of PCR‑amplified taxonomic markers such as the 16S rRNA gene has enabled detailed analysis of complex bacterial communities, but requires denoising, taxonomic assignment, and normalization (e.g., 97 % similarity clustering and subsampling) to compare abundance across conditions. The study demonstrates that statistical models yield more accurate abundance estimates than conventional methods. The authors provide a complete R workflow that integrates dada2, phyloseq, DESeq2, ggplot2, vegan, random forests, and community network tools (ggnetwork) for filtering, visualizing, and testing microbiome data.

Abstract

<ns4:p>High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or OTU composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, whether parametric or nonparametric. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests and nonparametric testing using community networks and the ggnetwork package.</ns4:p>

References

Page 1

	Year	Citations

Page 1