The power of protein interaction networks for associating genes with diseases

TLDR

Understanding which genes cause genetic diseases is critical for human health, and recent high‑throughput protein‑interaction data provide a new way to infer these associations, though the relative strengths of existing computational approaches remain unclear. We evaluated seven state‑of‑the‑art protein‑interaction‑based methods, including variants, to determine their effectiveness in predicting gene–disease links. Random‑walk approaches outperformed clustering and neighborhood methods, each method produced unique predictions, and a consensus model achieved Pareto‑optimal performance; moreover, diseases with diffusely distributed disease proteins were harder to predict, highlighting when network data alone suffices versus when additional sources are needed. Predictions and supplementary data are available at http://www.cbcb.umd.edu/DiseaseNet (contact: carlk@cs.umd.edu).

Abstract

Abstract Motivation: Understanding the association between genetic diseases and their causal genes is an important problem concerning human health. With the recent influx of high-throughput data describing interactions between gene products, scientists have been provided a new avenue through which these associations can be inferred. Despite the recent interest in this problem, however, there is little understanding of the relative benefits and drawbacks underlying the proposed techniques. Results: We assessed the utility of physical protein interactions for determining gene–disease associations by examining the performance of seven recently developed computational methods (plus several of their variants). We found that random-walk approaches individually outperform clustering and neighborhood approaches, although most methods make predictions not made by any other method. We show how combining these methods into a consensus method yields Pareto optimal performance. We also quantified how a diffuse topological distribution of disease-related proteins negatively affects prediction quality and are thus able to identify diseases especially amenable to network-based predictions and others for which additional information sources are absolutely required. Availability: The predictions made by each algorithm considered are available online at http://www.cbcb.umd.edu/DiseaseNet Contact: carlk@cs.umd.edu Supplementary information: Supplementary data are available at Bioinformatics online.

References

Page 1

	Year	Citations

Page 1