Scalable RDF store based on HBase and MapReduce

Abstract

The growing size of Resource Description Framework (RDF) dataset requires RDF repository to be excellent scalable and highly efficient. Distributed and parallel processing model meets the urgent needs naturally. In this paper, we propose a scalable RDF store based on HBase, which is a distributed, column-oriented database modeled after Google's Bigtable. Our approach adopts the idea of Hexastore and considers both RDF data model and HBase capability. We store RDF triples into six HBase tables (S_PO, P_SO, O_SP, PS_O, SO_P and PO_S) which covers all combinations of RDF triple patterns. And we index them with HBase provided index structure on row key. Besides presenting the storage schema, we also propose a MapReduce strategy for SPARQL Basic Graph Pattern (BGP) processing, which is suitable for our storage schema. It uses multiple MapReduce jobs to process a typical BGP. In each job, it uses a greedy method to select join key and eliminates multiple triple patterns. The evaluation result indicates that our approach works well against large RDF dataset.

References

Page 1

	Year	Citations

Page 1