Concepedia

Publication | Open Access

Topic modeling for untargeted substructure exploration in metabolomics

365

Citations

34

References

2016

Year

TLDR

Tandem MS is a key technique for compound identification in untargeted metabolomics, yet the scarcity of reference spectra limits its utility. The authors introduce MS2LDA, an unsupervised method that extracts common patterns of mass fragments and neutral losses—Mass2Motifs—from fragmentation spectra. MS2LDA, inspired by text‑mining, analyzes collections of spectra to identify these Mass2Motifs. Using 30 structurally characterized Mass2Motifs, the authors annotated roughly three times as many molecules in four beer extracts compared to library matching, validated the motifs against reference spectra from GNPS and MassBank, and thereby enabled annotation of molecules lacking reference spectra while revealing biochemical relationships.

Abstract

Significance Tandem MS is a technique for compound identification in untargeted metabolomics experiments. Because of a lack of reference spectra, most molecules cannot be identified, and many spectra cannot be used. We present MS2LDA, an unsupervised method (inspired by text-mining) that extracts common patterns of mass fragments and neutral losses—Mass2Motifs—from collections of fragmentation spectra. Structurally characterized Mass2Motifs can be used to annotate molecules for which no reference spectra exist and expose biochemical relationships between molecules. For four beer extracts, without training data, we show that, with 30 structurally characterized Mass2Motifs, we can annotate approximately three times as many molecules as with library matching. These Mass2Motifs were validated in reference spectra from Global Natural Products Social Molecular Networking (GNPS) and MassBank.

References

YearCitations

Page 1