Table of Contents
Fetching ...

vLSM: Low tail latency and I/O amplification in LSM-based KV stores

Giorgos Xanthakis, Antonios Katsarakis, Giorgos Saloustros, Angelos Bilas

TL;DR

vLSM targets the persistent tail latency problem in LSM-based KV stores by modeling tail latency through compaction chains and proposing a design that reduces both chain width and length without inflating I/O amplification or memory. It eliminates tiering in L0, uses smaller SSTs, expands the L1–L2 growth factor to Phi, and introduces overlap-aware vSSTs to contain merge amplification. Experimental results show up to 4.8x improvements in P99 write latency and up to 12.5x in reads, with substantially fewer write stalls and no material increase in I/O amplification at a similar memory footprint. The approach offers a practical path to lower tail latency in production KV stores while maintaining efficiency across memory and I/O constraints, making it suitable for latency-sensitive applications.

Abstract

LSM-based key-value (KV) stores are an important component in modern data infrastructures. However, they suffer from high tail latency, in the order of several seconds, making them less attractive for user-facing applications. In this paper, we introduce the notion of compaction chains and we analyse how they affect tail latency. Then, we show that modern designs reduce tail latency, by trading I/O amplification or require large amounts of memory. Based on our analysis, we present vLSM, a new KV store design that improves tail latency significantly without compromising on memory or I/O amplification. vLSM reduces (a) compaction chain width by using small SSTs and eliminating the tiering compaction required in L0 by modern systems and (b) compaction chain length by using a larger than typical growth factor between L1 and L2 and introducing overlap-aware SSTs in L1. We implement vLSM in RocksDB and evaluate it using db_bench and YCSB. Our evaluation highlights the underlying trade-off among memory requirements, I/O amplification, and tail latency, as well as the advantage of vLSM over current approaches. vLSM improves P99 tail latency by up to 4.8x for writes and by up to 12.5x for reads, reduces cumulative write stalls by up to 60% while also slightly improves I/O amplification at the same memory budget.

vLSM: Low tail latency and I/O amplification in LSM-based KV stores

TL;DR

vLSM targets the persistent tail latency problem in LSM-based KV stores by modeling tail latency through compaction chains and proposing a design that reduces both chain width and length without inflating I/O amplification or memory. It eliminates tiering in L0, uses smaller SSTs, expands the L1–L2 growth factor to Phi, and introduces overlap-aware vSSTs to contain merge amplification. Experimental results show up to 4.8x improvements in P99 write latency and up to 12.5x in reads, with substantially fewer write stalls and no material increase in I/O amplification at a similar memory footprint. The approach offers a practical path to lower tail latency in production KV stores while maintaining efficiency across memory and I/O constraints, making it suitable for latency-sensitive applications.

Abstract

LSM-based key-value (KV) stores are an important component in modern data infrastructures. However, they suffer from high tail latency, in the order of several seconds, making them less attractive for user-facing applications. In this paper, we introduce the notion of compaction chains and we analyse how they affect tail latency. Then, we show that modern designs reduce tail latency, by trading I/O amplification or require large amounts of memory. Based on our analysis, we present vLSM, a new KV store design that improves tail latency significantly without compromising on memory or I/O amplification. vLSM reduces (a) compaction chain width by using small SSTs and eliminating the tiering compaction required in L0 by modern systems and (b) compaction chain length by using a larger than typical growth factor between L1 and L2 and introducing overlap-aware SSTs in L1. We implement vLSM in RocksDB and evaluate it using db_bench and YCSB. Our evaluation highlights the underlying trade-off among memory requirements, I/O amplification, and tail latency, as well as the advantage of vLSM over current approaches. vLSM improves P99 tail latency by up to 4.8x for writes and by up to 12.5x for reads, reduces cumulative write stalls by up to 60% while also slightly improves I/O amplification at the same memory budget.
Paper Structure (22 sections, 13 figures, 1 table)

This paper contains 22 sections, 13 figures, 1 table.

Figures (13)

  • Figure 1: RocksDB throughput and P99 latency in YCSB Load A, using 350M KV pairs with a KV size of 240 B.
  • Figure 2: RocksDB chain width (left) and chain length (right) for different SST sizes.
  • Figure 3: The main design points for mitigating high tail latency.
  • Figure 4: Impact on I/O amplification in RocksDB of (a) compacting a single SST between $L_0$ and $L_1$ when not maintaining growth factor between $L_0$ and $L_1$ and (b) the number of LSM levels when using 8 MB SSTs emulating LSMi.
  • Figure 5: Modified YCSB to measure tail latency in an open-loop manner at a controlled (fixed) request rate.
  • ...and 8 more figures