Mizan: Optimizing Graph Mining in Large Parallel Systems

Abstract

Extracting information from graphs, from nding shortest\npaths to complex graph mining, is essential for many ap-\nplications. Due to the shear size of modern graphs (e.g.,\nsocial networks), processing must be done on large paral-\nlel computing infrastructures (e.g., the cloud). Earlier ap-\nproaches relied on the MapReduce framework, which was\nproved inadequate for graph algorithms. More recently, the\nmessage passing model (e.g., Pregel) has emerged. Although\nthe Pregel model has many advantages, it is agnostic to the\ngraph properties and the architecture of the underlying com-\nputing infrastructure, leading to suboptimal performance.\nIn this paper, we propose Mizan, a layer between the users'\ncode and the computing infrastructure. Mizan considers the\nstructure of the input graph and the architecture of the in-\nfrastructure in order to: (i) decide whether it is bene cial to\ngenerate a near-optimal partitioning of the graph in a pre-\nprocessing step, and (ii) choose between typical point-to-\npoint message passing and a novel approach that puts com-\nputing nodes in a virtual overlay ring. We deployed Mizan\non a small local Linux cluster, on the cloud (256 virtual\nmachines in Amazon EC2), and on an IBM Blue Gene/P\nsupercomputer (1024 CPUs). We show that Mizan executes\ncommon algorithms on very large graphs 1-2 orders of mag-\nnitude faster than MapReduce-based implementations and\nup to one order of magnitude faster than implementations\nrelying on Pregel-like hash-based graph partitioning.

References

Page 1

	Year	Citations

Page 1