Concepedia

Publication | Closed Access

Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing

3.6K

Citations

33

References

2012

Year

TLDR

RDDs are motivated by the inefficiencies of current frameworks for iterative algorithms and interactive data mining tools, where keeping data in memory can boost performance by an order of magnitude. The paper introduces Resilient Distributed Datasets (RDDs), a distributed memory abstraction enabling fault‑tolerant in‑memory computations on large clusters. RDDs achieve fault tolerance through coarse‑grained transformations on a restricted shared memory model, and are implemented in the Spark system, which is evaluated on diverse user applications and benchmarks. The authors show that RDDs are expressive enough to encompass a broad range of computations, including specialized models such as Pregel, and that Spark’s implementation performs well across various user applications and benchmarks.

Abstract

We present Resilient Distributed Datasets (RDDs), a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner. RDDs are motivated by two types of applications that current computing frameworks handle inefficiently: iterative algorithms and interactive data mining tools. In both cases, keeping data in memory can improve performance by an order of magnitude. To achieve fault tolerance efficiently, RDDs provide a restricted form of shared memory, based on coarse-grained transformations rather than fine-grained updates to shared state. However, we show that RDDs are expressive enough to capture a wide class of computations, including recent specialized programming models for iterative jobs, such as Pregel, and new applications that these models do not capture. We have implemented RDDs in a system called Spark, which we evaluate through a variety of user applications and benchmarks.

References

YearCitations

2008

18.4K

1998

15.8K

1990

3.7K

2010

3.5K

2007

2.4K

2008

1.7K

2009

1.7K

2011

1.6K

2010

1.4K

2010

775

Page 1