Concepedia

TLDR

Functional analysis of the microbiome can reveal how microbial perturbations affect host phenotypes, but indirect inference from 16S rRNA profiles via tools such as PICRUSt and Tax4Fun is limited by outdated databases, uncertain phylogenies, and strict preprocessing requirements. The authors aim to provide a more accurate and flexible functional inference tool for microbiome studies. Piphillin is a straightforward algorithm that bypasses phylogenetic trees, uses up‑to‑date functional databases, and accepts diverse preprocessing pipelines. In human clinical samples, Piphillin outperformed PICRUSt and Tax4Fun in gene‑composition prediction (p<0.01 and p<0.001) and increased disease‑association accuracy by 15%, while no advantage was seen in animal or environmental samples, demonstrating its superiority for clinical biospecimens and its public availability at secondgenome.com/Piphillin.

Abstract

Functional analysis of a clinical microbiome facilitates the elucidation of mechanisms by which microbiome perturbation can cause a phenotypic change in the patient. The direct approach for the analysis of the functional capacity of the microbiome is via shotgun metagenomics. An inexpensive method to estimate the functional capacity of a microbial community is through collecting 16S rRNA gene profiles then indirectly inferring the abundance of functional genes. This inference approach has been implemented in the PICRUSt and Tax4Fun software tools. However, those tools have important limitations since they rely on outdated functional databases and uncertain phylogenetic trees and require very specific data pre-processing protocols. Here we introduce Piphillin, a straightforward algorithm independent of any proposed phylogenetic tree, leveraging contemporary functional databases and not obliged to any singular data pre-processing protocol. When all three inference tools were evaluated against actual shotgun metagenomics, Piphillin was superior in predicting gene composition in human clinical samples compared to both PICRUSt and Tax4Fun (p<0.01 and p<0.001, respectively) and Piphillin's ability to predict disease associations with specific gene orthologs exhibited a 15% increase in balanced accuracy compared to PICRUSt. From laboratory animal samples, no performance advantage was observed for any one of the tools over the others and for environmental samples all produced unsatisfactory predictions. Our results demonstrate that functional inference using the direct method implemented in Piphillin is preferable for clinical biospecimens. Piphillin is publicly available for academic use at http://secondgenome.com/Piphillin.

References

YearCitations

Page 1