Concepedia

Publication | Closed Access

Graph indexing

634

Citations

18

References

2004

Year

TLDR

Graph modeling is increasingly important for complex, schemaless data such as proteins, chemicals, and XML, and frequent substructures are ideal candidates because they capture intrinsic characteristics and remain stable to updates. The paper investigates graph indexing challenges and proposes a novel solution using graph mining techniques to retrieve graphs quickly from large databases. gIndex uses frequent substructures as basic indexing features and reduces index size with size‑increasing support constraints and discriminative fragments. Performance studies show gIndex reduces index size by tenfold while improving query performance 3–10× over the path‑based GraphGrep, and its concepts extend to indexing sequences, trees, and other complex structures, illustrating the benefits of frequent pattern mining for database indexing and query processing.

Abstract

Graph has become increasingly important in modelling complicated structures and schemaless data such as proteins, chemical compounds, and XML documents. Given a graph query, it is desirable to retrieve graphs quickly from a large database via graph-based indices. In this paper, we investigate the issues of indexing graphs and propose a novel solution by applying a graph mining technique. Different from the existing path-based methods, our approach, called gIndex, makes use of frequent substructure as the basic indexing feature. Frequent substructures are ideal candidates since they explore the intrinsic characteristics of the data and are relatively stable to database updates. To reduce the size of index structure, two techniques, size-increasing support constraint and discriminative fragments, are introduced. Our performance study shows that gIndex has 10 times smaller index size, but achieves 3--10 times better performance in comparison with a typical path-based method, GraphGrep. The gIndex approach not only provides and elegant solution to the graph indexing problem, but also demonstrates how database indexing and query processing can benefit form data mining, especially frequent pattern mining. Furthermore, the concepts developed here can be applied to indexing sequences, trees, and other complicated structures as well.

References

YearCitations

Page 1