Fast Random Walk with Restart and Its Applications

TLDR

Random walk with restart (RWR) is a widely used relevance measure for node pairs in weighted graphs, applied to tasks such as image captioning and personalized PageRank, but its naive implementations are infeasible on large, disk‑resident graphs due to quadratic space and cubic pre‑computation requirements. This work aims to develop fast RWR solutions that overcome these scalability limitations. The authors achieve this by leveraging two common graph properties—linear correlations and community‑like block structure—using low‑rank matrix approximation, graph partitioning, and the Sherman–Morrison lemma for efficient matrix inversion. Experiments on the Corel image and DBLP datasets show that the proposed methods reduce pre‑computation and storage costs by several orders of magnitude and deliver up to 150× speed‑ups while preserving over 90 % of the original quality.

Abstract

How closely related are two nodes in a graph? How to compute this score quickly, on huge, disk-resident, real graphs? Random walk with restart (RWR) provides a good relevance score between two nodes in a weighted graph, and it has been successfully used in numerous settings, like automatic captioning of images, generalizations to the "connection subgraphs", personalized PageRank, and many more. However, the straightforward implementations of RWR do not scale for large graphs, requiring either quadratic space and cubic pre-computation time, or slow response time on queries. We propose fast solutions to this problem. The heart of our approach is to exploit two important properties shared by many real graphs: (a) linear correlations and (b) block- wise, community-like structure. We exploit the linearity by using low-rank matrix approximation, and the community structure by graph partitioning, followed by the Sherman- Morrison lemma for matrix inversion. Experimental results on the Corel image and the DBLP dabasets demonstrate that our proposed methods achieve significant savings over the straightforward implementations: they can save several orders of magnitude in pre-computation and storage cost, and they achieve up to 150x speed up with 90%+ quality preservation.

References

Page 1

	Year	Citations

Page 1