Concepedia

TLDR

Graph data is common across many domains, yet its analysis typically requires specialized engines, making it difficult for users and hindering optimization of end‑to‑end workflows. GraphFrames is an integrated system that lets users combine graph algorithms, pattern matching, and relational queries while optimizing across them. It materializes multiple graph views, executes iterative algorithms and pattern matching through joins, exposes a declarative data‑frame API, and applies graph‑aware join optimization to select the best view for each computation.

Abstract

Graph data is prevalent in many domains, but it has usually required specialized engines to analyze. This design is onerous for users and precludes optimization across complete workflows. We present GraphFrames, an integrated system that lets users combine graph algorithms, pattern matching and relational queries, and optimizes work across them. GraphFrames generalize the ideas in previous graph-on-RDBMS systems, such as GraphX and Vertexica, by letting the system materialize multiple views of the graph (not just the specific triplet views in these systems) and executing both iterative algorithms and pattern matching using joins. To make applications easy to write, GraphFrames provide a concise, declarative API based on the "data frame" concept in R that can be used for both interactive queries and standalone programs. Under this API, GraphFrames use a graph-aware join optimization algorithm across the whole computation that can select from the available views.

References

YearCitations

Page 1