Concepedia

TLDR

SSDs are rapidly replacing HDDs as primary storage for key‑value stores, yet deploying LSM‑tree KV stores on commercial SSDs causes heavy write amplification and garbage‑collection overhead because redundant management layers in the LSM tree, file system, and flash translation layer prevent device awareness. FlashKV is proposed as an LSM‑tree based key‑value store that runs directly on open‑channel SSDs to eliminate these inefficiencies. By managing raw flash devices at the application layer, FlashKV removes redundant layers and, using LSM‑tree domain knowledge and open‑channel information, applies a parallel data layout and optimizes compaction, caching, and I/O scheduling to exploit the device’s internal parallelism. Benchmarks show that FlashKV boosts performance by 1.5× to 4.5× and cuts write traffic by up to 50 % under heavy write workloads compared to LevelDB.

Abstract

As the cost-per-bit of solid state disks is decreasing quickly, SSDs are supplanting HDDs in many cases, including the primary storage of key-value stores. However, simply deploying LSM-tree-based key-value stores on commercial SSDs is inefficient and induces heavy write amplification and severe garbage collection overhead under write-intensive conditions. The main cause of these critical issues comes from the triple redundant management functionalities lying in the LSM-tree, file system and flash translation layer, which block the awareness between key-value stores and flash devices. Furthermore, we observe that the performance of LSM-tree-based key-value stores is improved little by only eliminating these redundant layers, as the I/O stacks, including the cache and scheduler, are not optimized for LSM-tree’s unique I/O patterns. To address the issues above, we propose FlashKV, an LSM-tree based key-value store running on open-channel SSDs. FlashKV eliminates the redundant management and semantic isolation by directly managing the raw flash devices in the application layer. With the domain knowledge of LSM-tree and the open-channel information, FlashKV employs a parallel data layout to exploit the internal parallelism of the flash device, and optimizes the compaction, caching and I/O scheduling mechanisms specifically. Evaluations show that FlashKV effectively improves system performance by 1.5× to 4.5× and decreases up to 50% write traffic under heavy write conditions, compared to LevelDB.

References

YearCitations

Page 1