Table of Contents
Fetching ...

LearnedKV: Integrating LSM and Learned Index for Superior Performance on Storage

Wenlong Wang, David Hung-Chang Du

TL;DR

LearnedKV tackles the problem of balancing read efficiency and write throughput in key-value stores by decoupling the read path from the write path. It introduces a tiered architecture that uses an LSM tree for recent writes and a read-optimized Learned Index built during non-blocking garbage collection, drastically reducing LSM size and accelerating reads. The key contributions include a non-blocking GC-driven conversion from LSM to a Learned Index, the Greedy-PLR+ on-storage index design, and multiple optimizations such as a Bloom filter and range-query assisted conversion. Empirically, LearnedKV delivers up to 4.32x read and 1.43x write performance gains across SSD/HDD, diverse distributions, and real-world datasets, highlighting its robustness and practical impact for large-scale KV storage systems.

Abstract

We present LearnedKV, a novel tiered key-value store that seamlessly integrates a Log-Structured Merge (LSM) tree with a Learned Index to achieve superior read and write performance on storage systems. While existing approaches use learned indexes primarily as auxiliary components within LSM trees, LearnedKV employs a two-tier design where the LSM tree handles recent write operations while a separate Learned Index accelerates read performance. Our design includes a non-blocking conversion mechanism that efficiently transforms LSM data into a Learned Index during garbage collection, maintaining high performance without interrupting operations. LearnedKV dramatically reduces LSM size through this tiered approach, leading to significant performance gains in both reads and writes. Extensive evaluations across diverse workloads show that LearnedKV outperforms state-of-the-art LSM-based solutions by up to 4.32x for read operations and 1.43x for writes. The system demonstrates robust performance across different data distributions, access patterns, and storage media including both SSDs and HDDs.

LearnedKV: Integrating LSM and Learned Index for Superior Performance on Storage

TL;DR

LearnedKV tackles the problem of balancing read efficiency and write throughput in key-value stores by decoupling the read path from the write path. It introduces a tiered architecture that uses an LSM tree for recent writes and a read-optimized Learned Index built during non-blocking garbage collection, drastically reducing LSM size and accelerating reads. The key contributions include a non-blocking GC-driven conversion from LSM to a Learned Index, the Greedy-PLR+ on-storage index design, and multiple optimizations such as a Bloom filter and range-query assisted conversion. Empirically, LearnedKV delivers up to 4.32x read and 1.43x write performance gains across SSD/HDD, diverse distributions, and real-world datasets, highlighting its robustness and practical impact for large-scale KV storage systems.

Abstract

We present LearnedKV, a novel tiered key-value store that seamlessly integrates a Log-Structured Merge (LSM) tree with a Learned Index to achieve superior read and write performance on storage systems. While existing approaches use learned indexes primarily as auxiliary components within LSM trees, LearnedKV employs a two-tier design where the LSM tree handles recent write operations while a separate Learned Index accelerates read performance. Our design includes a non-blocking conversion mechanism that efficiently transforms LSM data into a Learned Index during garbage collection, maintaining high performance without interrupting operations. LearnedKV dramatically reduces LSM size through this tiered approach, leading to significant performance gains in both reads and writes. Extensive evaluations across diverse workloads show that LearnedKV outperforms state-of-the-art LSM-based solutions by up to 4.32x for read operations and 1.43x for writes. The system demonstrates robust performance across different data distributions, access patterns, and storage media including both SSDs and HDDs.

Paper Structure

This paper contains 25 sections, 15 figures, 7 tables.

Figures (15)

  • Figure 1: RocksDB architecture. In this figure and following sections, $L_0$ represents the highest level and $L_n$ represents the lowest level.
  • Figure 2: Bourbon lookup process
  • Figure 3: Read and write time cost of the RocksDB across different stages
  • Figure 4: Using read-optimized index (Learned Index) to index the "static" data
  • Figure 5: LearnedKV Architecture
  • ...and 10 more figures