Publication | Open Access
An integrated approach to inferring gene–disease associations in humans
178
Citations
62
References
2008
Year
Modern bioinformatics focuses on developing computational tools to understand and treat human disease, and candidate gene prioritization methods are increasingly useful. The authors propose an algorithm that detects gene–disease associations using the protein–protein interaction network, known associations, protein sequence, and functional information. PhenoPred maps genes onto disease and functional term spaces using network distances, encodes sequence, functional, physicochemical, and structural features, and trains support vector machines to predict gene–disease associations. The method successfully identifies candidate genes for many Disease Ontology terms, demonstrating robustness to noisy data and incomplete ontologies even when many disease terms are predicted simultaneously. PhenoPred is available at www.phenopred.org; the software was published in Proteins (2008) © 2008 Wiley‑Liss, Inc.
Abstract One of the most important tasks of modern bioinformatics is the development of computational tools that can be used to understand and treat human disease. To date, a variety of methods have been explored and algorithms for candidate gene prioritization are gaining in their usefulness. Here, we propose an algorithm for detecting gene–disease associations based on the human protein–protein interaction network, known gene–disease associations, protein sequence, and protein functional information at the molecular level. Our method, PhenoPred, is supervised: first, we mapped each gene/protein onto the spaces of disease and functional terms based on distance to all annotated proteins in the protein interaction network. We also encoded sequence, function, physicochemical, and predicted structural properties, such as secondary structure and flexibility. We then trained support vector machines to detect gene–disease associations for a number of terms in Disease Ontology and provided evidence that, despite the noise/incompleteness of experimental data and unfinished ontology of diseases, identification of candidate genes can be successful even when a large number of candidate disease terms are predicted on simultaneously. Availability: www.phenopred.org . Proteins 2008. © 2008 Wiley‐Liss, Inc.
| Year | Citations | |
|---|---|---|
Page 1
Page 1