Publication | Closed Access
Scale-Out Processing of Large RDF Datasets
32
Citations
34
References
2015
Year
Cluster ComputingDistributed FiltersEngineeringLarge Rdf DatasetsLightweight Primary IndexBig Data IndexingMap-reduceSemantic WebInformation RetrievalData ScienceData MiningManagementData IntegrationParallel ComputingData ManagementKnowledge DiscoveryComputer ScienceBig Data SearchDistributed Query ProcessingQuery OptimizationParallel ProgrammingMassive Data ProcessingBig Data
Distributed RDF data management systems become increasingly important with the growth of the Semantic Web. Regardless, current methods meet performance bottlenecks either on data loading or querying when processing large amounts of data. In this work, we propose efficient methods for processing RDF using dynamic data re-partitioning to enable rapid analysis of large datasets. Our approach adopts a two-tier index architecture on each computation node: (1) a lightweight primary index, to keep loading times low, and (2) a series of dynamic, multi-level secondary indexes, calculated as a by-product of query execution, to decrease or remove inter-machine data movement for subsequent queries that contain the same graph patterns. In addition, we propose methods to replace some secondary indexes with distributed filters, so as to decrease memory consumption. Experimental results on a commodity cluster with 16 nodes show that the method presents good scale-out characteristics and can indeed vastly improve loading speeds while remaining competitive in terms of performance. Specifically, our approach can load a dataset of 1.1 billion triples at a rate of 2.48 million triples per second and provide competitive performance to RDF-3X and 4store for expensive queries.
| Year | Citations | |
|---|---|---|
Page 1
Page 1