Publication | Closed Access
Scalable RDF store based on HBase and MapReduce
80
Citations
8
References
2010
Year
Unknown Venue
Cluster ComputingEngineeringMap-reduceSemantic WebRdf TriplesScalable Rdf StoreInformation RetrievalData ScienceDatabase SupportKeyvalue DatabaseData IntegrationData ManagementRdf RepositoryDistributed Query ProcessingBig Data SearchCloud ComputingRdf Triple PatternsDistributed Data StoreMassive Data ProcessingBig Data
The growing size of Resource Description Framework (RDF) dataset requires RDF repository to be excellent scalable and highly efficient. Distributed and parallel processing model meets the urgent needs naturally. In this paper, we propose a scalable RDF store based on HBase, which is a distributed, column-oriented database modeled after Google's Bigtable. Our approach adopts the idea of Hexastore and considers both RDF data model and HBase capability. We store RDF triples into six HBase tables (S_PO, P_SO, O_SP, PS_O, SO_P and PO_S) which covers all combinations of RDF triple patterns. And we index them with HBase provided index structure on row key. Besides presenting the storage schema, we also propose a MapReduce strategy for SPARQL Basic Graph Pattern (BGP) processing, which is suitable for our storage schema. It uses multiple MapReduce jobs to process a typical BGP. In each job, it uses a greedy method to select join key and eliminates multiple triple patterns. The evaluation result indicates that our approach works well against large RDF dataset.
| Year | Citations | |
|---|---|---|
Page 1
Page 1