Concepedia

Publication | Closed Access

RDF-3X

551

Citations

29

References

2008

Year

TLDR

RDF is a schema‑free data representation format gaining traction in Semantic‑Web, life sciences, and Web 2.0, but its pay‑as‑you‑go nature and SPARQL’s flexible pattern matching create efficiency and scalability challenges for complex queries, especially long join paths. This paper introduces RDF‑3X, a SPARQL engine that achieves high performance through a RISC‑style, streamlined architecture with carefully designed, puristic data structures and operations. RDF‑3X stores and indexes RDF triples without physical‑design tuning, processes queries with fast merge joins, and optimizes join orders using a cost model based on statistical synopses of entire join paths. On datasets with over 50 million triples, RDF‑3X outperforms the previous state‑of‑the‑art systems on benchmark queries involving pattern matching and long join paths.

Abstract

RDF is a data representation format for schema-free structured information that is gaining momentum in the context of Semantic-Web corpora, life sciences, and also Web 2.0 platforms. The "pay-as-you-go" nature of RDF and the flexible pattern-matching capabilities of its query language SPARQL entail efficiency and scalability challenges for complex queries including long join paths. This paper presents the RDF-3X engine, an implementation of SPARQL that achieves excellent performance by pursuing a RISC-style architecture with a streamlined architecture and carefully designed, puristic data structures and operations. The salient points of RDF-3X are: 1) a generic solution for storing and indexing RDF triples that completely eliminates the need for physical-design tuning, 2) a powerful yet simple query processor that leverages fast merge joins to the largest possible extent, and 3) a query optimizer for choosing optimal join orders using a cost model based on statistical synopses for entire join paths. The performance of RDF-3X, in comparison to the previously best state-of-the-art systems, has been measured on several large-scale datasets with more than 50 million RDF triples and benchmark queries that include pattern matching and long join paths in the underlying data graphs.

References

YearCitations

Page 1