Table of Contents
Fetching ...

KV-Tandem -- a Modular Approach to Building High-Speed LSM Storage Engines

Edward Bortnikov, Michael Azran, Asa Bornstein, Shmuel Dashevsky, Dennis Huang, Omer Kepten, Michael Pan, Gali Sheffi, Moshe Twitto, Tamar Weiss Orzech, Idit Keidar, Guy Gueta, Roey Maor, Niv Dayan

TL;DR

KV-Tandem enables advanced functionalities such as range queries and snapshot reads, while maintaining the native KVS performance for random reads and writes, in a modular architecture for building LSM-based storage engines on top of simple, non-ordered persistent key-value stores (KVSs).

Abstract

We present~\emph{KV-Tandem}, a modular architecture for building LSM-based storage engines on top of simple, non-ordered persistent key-value stores (KVSs). KV-Tandem enables advanced functionalities such as range queries and snapshot reads, while maintaining the native KVS performance for random reads and writes. Its modular design offers better performance trade-offs compared to previous KV-separation solutions, which struggle to decompose the monolithic LSM structure. Central to KV-Tandem is~\emph{LSM bypass} -- a novel algorithm that offers a fast path to basic operations while ensuring the correctness of advanced APIs. We implement KV-Tandem in \emph{XDP-Rocks}, a RocksDB-compatible storage engine that leverages the XDP KVS and incorporates practical design optimizations for real-world deployment. Through extensive microbenchmark and system-level comparisons, we demonstrate that XDP-Rocks achieves 3x to 4x performance improvements over RocksDB across various workloads. XDP-Rocks is already deployed in production, delivering significant operator cost savings consistent with these performance gains.

KV-Tandem -- a Modular Approach to Building High-Speed LSM Storage Engines

TL;DR

KV-Tandem enables advanced functionalities such as range queries and snapshot reads, while maintaining the native KVS performance for random reads and writes, in a modular architecture for building LSM-based storage engines on top of simple, non-ordered persistent key-value stores (KVSs).

Abstract

We present~\emph{KV-Tandem}, a modular architecture for building LSM-based storage engines on top of simple, non-ordered persistent key-value stores (KVSs). KV-Tandem enables advanced functionalities such as range queries and snapshot reads, while maintaining the native KVS performance for random reads and writes. Its modular design offers better performance trade-offs compared to previous KV-separation solutions, which struggle to decompose the monolithic LSM structure. Central to KV-Tandem is~\emph{LSM bypass} -- a novel algorithm that offers a fast path to basic operations while ensuring the correctness of advanced APIs. We implement KV-Tandem in \emph{XDP-Rocks}, a RocksDB-compatible storage engine that leverages the XDP KVS and incorporates practical design optimizations for real-world deployment. Through extensive microbenchmark and system-level comparisons, we demonstrate that XDP-Rocks achieves 3x to 4x performance improvements over RocksDB across various workloads. XDP-Rocks is already deployed in production, delivering significant operator cost savings consistent with these performance gains.

Paper Structure

This paper contains 48 sections, 14 figures, 3 algorithms.

Figures (14)

  • Figure 1: KV-Tandem high-level architecture: providing a storage engine API (RocksDB-compatible) atop a KVS. Values are embedded in the KVS directly, while their keys are embedded indirectly through the LSM tree. The LSM bypass offers a fast-path for point queries, while iterators and snapshots go through the LSM. The KVS GC and the LSM compaction are independent.
  • Figure 2: BlobDB's unbounded storage growth.
  • Figure 3: Maximal throughput and latency
  • Figure 4: Throughput dynamics
  • Figure 6: Uniform distribution
  • ...and 9 more figures