Concepedia

Publication | Closed Access

High-performance, massively scalable distributed systems using the MapReduce software framework

152

Citations

6

References

2010

Year

TLDR

The paper explores how MapReduce can be used to build high‑performance, massively scalable distributed systems, examining design challenges, presenting the SHARD triple‑store implementation, and considering alternative frameworks. The authors focus on Hadoop and propose a general approach to build information systems that answer data queries, addressing design challenges such as scalable index construction. Experimental results from an early SHARD prototype confirm its high‑performance, scalable triple‑store capabilities.

Abstract

In this paper we discuss the use of the MapReduce software framework to address the challenge of constructing high-performance, massively-scalable distributed systems. We discuss several design considerations associated with constructing complex distributed systems using the MapReduce software framework, including the difficulty of scalably building indexes. We focus on Hadoop, the most popular MapReduce implementation. Our discussion and analysis are motivated by our construction of SHARD, a massively scalable, high-performance and robust triple-store technology on top of Hadoop. We provide a general approach to construct an information system from the MapReduce software framework that responds to data queries. We provide experimental results generated of an early version of SHARD. We close with a discussion of hypothetical MapReduce alternatives that can be used for the construction of more scalable distributed computing systems.

References

YearCitations

Page 1