Concepedia

Publication | Closed Access

Graphs-at-a-time

440

Citations

29

References

2008

Year

TLDR

Graph data are increasingly common across domains, yet querying heterogeneous graphs is difficult because subgraph isomorphism is NP‑complete. The authors propose a query language that supports arbitrary attributes on nodes, edges, and graphs. The language treats graphs as the basic unit, extends relational algebra with a generalized selection operator for graph pattern matching and a composition operator for rewriting matched graphs, and employs neighborhood subgraphs, profile‑based pruning, joint search‑space reduction, and search‑order optimization to accelerate selection. Experimental results on real and synthetic large graphs demonstrate that these graph‑specific optimizations outperform an SQL‑based implementation by orders of magnitude.

Abstract

With the prevalence of graph data in a variety of domains, there is an increasing need for a language to query and manipulate graphs with heterogeneous attributes and structures. We propose a query language for graph databases that supports arbitrary attributes on nodes, edges, and graphs. In this language, graphs are the basic unit of information and each query manipulates one or more collections of graphs. To allow for flexible compositions of graph structures, we extend the notion of formal languages from strings to the graph domain. We present a graph algebra extended from the relational algebra in which the selection operator is generalized to graph pattern matching and a composition operator is introduced for rewriting matched graphs. Then, we investigate access methods of the selection operator. Pattern matching over large graphs is challenging due to the NP-completeness of subgraph isomorphism. We address this by a combination of techniques: use of neighborhood subgraphs and profiles, joint reduction of the search space, and optimization of the search order. Experimental results on real and synthetic large graphs demonstrate that our graph specific optimizations outperform an SQL-based implementation by orders of magnitude.

References

YearCitations

Page 1