Piccolo: building fast, distributed programs with partitioned tables

TLDR

Piccolo is a data‑centric programming model that lets parallel in‑memory applications in data centers share distributed mutable state through a key‑value table interface, enabling efficient implementations. It allows developers to specify locality policies and uses a runtime that automatically resolves write‑write conflicts with user‑defined accumulation functions, supporting applications such as PageRank, k‑means clustering, and distributed crawling. Benchmarks on 100 Amazon EC2 instances and a 12‑node cluster show Piccolo outperforms existing data‑flow models on many problems while offering comparable fault tolerance and a convenient programming interface.

Abstract

Piccolo is a new data-centric programming model for writing parallel in-memory applications in data centers. Unlike existing data-flow models, Piccolo allows computation running on different machines to share distributed, mutable state via a key-value table interface. Piccolo enables efficient application implementations. In particular, applications can specify locality policies to exploit the locality of shared state access and Piccolo's run-time automatically resolves write-write conflicts using user-defined accumulation functions.Using Piccolo, we have implemented applications for several problem domains, including the PageRank algorithm, k-means clustering and a distributed crawler. Experiments using 100 Amazon EC2 instances and a 12 machine cluster show Piccolo to be faster than existing data flow models for many problems, while providing similar fault-tolerance guarantees and a convenient programming interface.

References

Page 1

	Year	Citations

Page 1