Publication | Closed Access
FaRM: fast remote memory
502
Citations
40
References
2014
Year
Cluster ComputingEngineeringComputer ArchitectureGraph StoreParallel StorageLocation TransparencyShared MemoryComputing SystemsKeyvalue DatabaseParallel ComputingData ManagementFast Remote MemoryDistributed SystemsComputer ScienceDistributed Data StorageMemory ArchitectureParallel ProgrammingBest Rdma PerformanceDistributed Data StoreIn-storage Computing
FaRM is a distributed main‑memory platform that uses RDMA to provide an order‑of‑magnitude improvement in latency and throughput over TCP/IP‑based systems, offering a simple transaction‑based programming model suitable for most applications. FaRM exposes cluster memory as a shared address space, enabling lock‑free RDMA reads, transactionally allocated objects, and collocated function shipping, all tuned for optimal RDMA performance. In a 20‑machine cluster, FaRM’s key‑value store achieves 167 million lookups per second with 31 µs latency, demonstrating its high performance.
We describe the design and implementation of FaRM, a new main memory distributed computing platform that exploits RDMA to improve both latency and throughput by an order of magnitude relative to state of the art main memory systems that use TCP/IP. FaRM exposes the memory of machines in the cluster as a shared address space. Applications can use transactions to allocate, read, write, and free objects in the address space with location transparency. We expect this simple programming model to be sufficient for most application code. FaRM provides two mechanisms to improve performance where required: lock-free reads over RDMA, and support for collocating objects and function shipping to enable the use of efficient single machine transactions. FaRM uses RDMA both to directly access data in the shared address space and for fast messaging and is carefully tuned for the best RDMA performance. We used FaRM to build a key-value store and a graph store similar to Facebook's. They both perform well, for example, a 20-machine cluster can perform 167 million key-value lookups per second with a latency of 31µs.
| Year | Citations | |
|---|---|---|
Page 1
Page 1