Publication | Closed Access
Navigating the maze of graph analytics frameworks using massive graph datasets
180
Citations
23
References
2014
Year
Unknown Venue
Cluster ComputingEngineeringPopular Graph FrameworksGraph DatabaseMost FrameworksGraph ProcessingData ScienceData MiningParallel ComputingHigh-performance Data AnalyticsGraph AlgorithmsGraph Analytics FrameworksKnowledge DiscoveryComputer ScienceGraph AlgorithmMassive Graph DatasetsGraph TheoryBusinessParallel ProgrammingGraph AnalysisMassive Data ProcessingBig Data
Graph algorithms are becoming increasingly important for analyzing large datasets in many fields. Real-world graph data follows a pattern of sparsity, that is not uniform but highly skewed towards a few items. Implementing graph traversal, statistics and machine learning algorithms on such data in a scalable manner is quite challenging. As a result, several graph analytics frameworks (GraphLab, CombBLAS, Giraph, SociaLite and Galois among others) have been developed, each offering a solution with different programming models and targeted at different users. Unfortunately, the "Ninja performance gap" between optimized code and most of these frameworks is very large (2-30X for most frameworks and up to 560X for Giraph) for common graph algorithms, and moreover varies widely with algorithms. This makes the end-users' choice of graph framework dependent not only on ease of use but also on performance. In this work, we offer a quantitative roadmap for improving the performance of all these frameworks and bridging the "ninja gap". We first present hand-optimized baselines that get performance close to hardware limits and higher than any published performance figure for these graph algorithms. We characterize the performance of both this native implementation as well as popular graph frameworks on a variety of algorithms. This study helps end-users delineate bottlenecks arising from the algorithms themselves vs. programming model abstractions vs. the framework implementations. Further, by analyzing the system-level behavior of these frameworks, we obtain bottlenecks that are agnostic to specific algorithms. We recommend changes to alleviate these bottlenecks (and implement some of them) and reduce the performance gap with respect to native code. These changes will enable end-users to choose frameworks based mostly on ease of use.
| Year | Citations | |
|---|---|---|
Page 1
Page 1