Efficient random data accessing in MapReduce

Abstract

The voluminous data can not be handled using traditional serial programming methods. It needs to be dealt effectively using parallel programming methods in a distributed environment. Emerging technologies for parallel processing has been changing the concept of programming, storage and operating system in distributed environment. Grid Computing and MapReduce technologies have been proven very handy in processing huge volume of simple and multi-dimensional data. The MapReduce in the Hadoop provides an abstract environment for parallel processing of jobs. The framework is well-known for its data analysis capability. However, it is efficient for sequential read and writes. It does not show good performance for random read and writes. It is so because the Hadoop is based on the key-value storage concept and does not work on any indexed dataset. A lot of work has been done to improve the performance of the Hadoop. Indexing input dataset in the Hadoop is one such area. In this paper, a B-Tree index construction process in traditional programming environment is described and a conceptual idea of realizing it in the MapReduce-Hadoop is presented.

References

Page 1

	Year	Citations

Page 1