Publication | Open Access
A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data
295
Citations
36
References
2015
Year
High‑throughput omics technologies generate large‑scale genomic data, and integrating these heterogeneous datasets is increasingly pursued to uncover deeper biological insights, yet heterogeneity across data sources hampers signal detection. This study introduces a novel non‑negative matrix factorization–based method for multi‑modal data analysis that addresses heterogeneity in omics datasets. The method jointly decomposes multiple data matrices with an optional sparsity constraint, and its performance was assessed on synthetic data and on DNA methylation, gene expression, and miRNA expression profiles from ovarian cancer samples in TCGA. The analysis revealed common modules across patient samples that are linked to cancer‑related pathways and correspond to previously established ovarian cancer subtypes. Source code is available at https://github.com/yangzi4/iNMF, with contact gmichail@umich.edu and supplementary data hosted on Bioinformatics online.
Abstract Motivation: Recent advances in high-throughput omics technologies have enabled biomedical researchers to collect large-scale genomic data. As a consequence, there has been growing interest in developing methods to integrate such data to obtain deeper insights regarding the underlying biological system. A key challenge for integrative studies is the heterogeneity present in the different omics data sources, which makes it difficult to discern the coordinated signal of interest from source-specific noise or extraneous effects. Results: We introduce a novel method of multi-modal data analysis that is designed for heterogeneous data based on non-negative matrix factorization. We provide an algorithm for jointly decomposing the data matrices involved that also includes a sparsity option for high-dimensional settings. The performance of the proposed method is evaluated on synthetic data and on real DNA methylation, gene expression and miRNA expression data from ovarian cancer samples obtained from The Cancer Genome Atlas. The results show the presence of common modules across patient samples linked to cancer-related pathways, as well as previously established ovarian cancer subtypes. Availability and implementation: The source code repository is publicly available at https://github.com/yangzi4/iNMF. Contact: gmichail@umich.edu Supplementary information: Supplementary data are available at Bioinformatics online.
| Year | Citations | |
|---|---|---|
Page 1
Page 1