Concepedia

Publication | Closed Access

CoS-HDFS

11

Citations

21

References

2016

Year

Abstract

Given the recent advancement in the ubiquitous positioning technologies, it is now common to query terabytes of spatial data. These massive data are usually geo-distributed across multiple data centers to ensure their availability. Yet, at least one replica of the data is stored close to where the data are generated. Spatial queries are complex and computationally intensive, and therefore, distributed computation platforms, such as Hadoop are now used to improve their execution time. However, Hadoop is agnostic to the spatial data characteristics, and it randomly partitions and locates the data stored in its distributed file system which degrades the performance of the execution of spatial queries. In this paper, we propose CoS-HDFS, an extension to the Hadoop Distributed File System (HDFS) that takes into account the spatial characteristics of the data and accordingly co-locates them on the HDFS nodes that span multiple data centers. We integrate CoS-HDFS with SpatialHadoop, a MapReduce framework that natively supports spatial data, to make use of its implementation of spatial indexes, operations, and query interfaces. We experimentally demonstrate significant reduction in the network usage and total execution time in the case of spatial join queries on the TIGER dataset.

References

YearCitations

Page 1