Discovering statistically significant biclusters in gene expression data

TLDR

In gene expression data, a bicluster is a subset of genes showing consistent patterns across a subset of conditions. The study proposes a new method to detect statistically significant biclusters in large gene expression datasets. The method uses graph‑theoretic techniques combined with statistical modeling, and was tested on yeast expression profiles and a human cancer dataset. The algorithm is polynomial, achieves high specificity in functional annotation (annotating 196 uncharacterized yeast genes), uncovers new biological associations, detects finer tissue types in cancer data, and outperforms the Cheng and Church biclustering algorithm. Authors: amos@tau.ac.il, roded@tau.ac.il, rshamir@tau.ac.il; all contributed equally.

Abstract

Abstract In gene expression data, a bicluster is a subset of the genes exhibiting consistent patterns over a subset of the conditions. We propose a new method to detect significant biclusters in large expression datasets. Our approach is graph theoretic coupled with statistical modelling of the data. Under plausible assumptions, our algorithm is polynomial and is guaranteed to find the most significant biclusters. We tested our method on a collection of yeast expression profiles and on a human cancer dataset. Cross validation results show high specificity in assigning function to genes based on their biclusters, and we are able to annotate in this way 196 uncharacterized yeast genes. We also demonstrate how the biclusters lead to detecting new concrete biological associations. In cancer data we are able to detect and relate finer tissue types than was previously possible. We also show that the method outperforms the biclustering algorithm of Cheng and Church (2000). Contact: amos@tau.ac.il; roded@tau.ac.il; rshamir@tau.ac.il Availability: *These authors contributed equally to this work.

References

Page 1

	Year	Citations

Page 1