Autumn: A Scalable Read Optimized LSM-tree based Key-Value Stores with Fast Point and Range Read Speed

Fuheng Zhao; Zach Miller; Leron Reznikov; Divyakant Agrawal; Amr El Abbadi

Autumn: A Scalable Read Optimized LSM-tree based Key-Value Stores with Fast Point and Range Read Speed

Fuheng Zhao, Zach Miller, Leron Reznikov, Divyakant Agrawal, Amr El Abbadi

TL;DR

Autumn introduces Garnering, a read-optimized extension to LSM-tree stores that dynamically adjusts inter-level capacity ratios to improve point and range reads. The key idea is to fix the last-level ratio while scaling lower-level gaps by a factor $c<1$, resulting in a level count of $L=O(\sqrt{-\log_{c}(\frac{N}{B\cdot T})})$ and worst-case read costs around $O(\sqrt{-\log_{c}(\frac{N}{B\cdot T})})$, i.e., $O(\sqrt{\log N})$, with or without Bloom filters. Autumn also employs a delayed last-level compaction strategy and DRAM pinning for Level 0 to curb write amplification, and it leverages Bloom-filter optimization to minimize point-read I/Os. Empirical results on LevelDB/RocksDB show substantial read-speed gains with modest or comparable write amplification, validating Autumn’s applicability to OLTP and HTAP workloads. The work provides a principled framework for balancing read and write costs in LSM-trees, supported by theoretical analyses and substantial benchmarking.

Abstract

The Log Structured Merge Trees (LSM-tree) based key-value stores are widely used in many storage systems to support a variety of operations such as updates, point reads, and range reads. Traditionally, LSM-tree's merge policy organizes data into multiple levels of exponentially increasing capacity to support high-speed writes. However, we contend that the traditional merge policies are not optimized for reads. In this work, we present Autumn, a scalable and read optimized LSM-tree based key-value stores with minimal point and range read cost. The key idea in improving the read performance is to dynamically adjust the capacity ratio between two adjacent levels as more data are stored. As a result, smaller levels gradually increase their capacities and merge more often. In particular, the point and range read cost improves from the previous best known $O(logN)$ complexity to $O(\sqrt{logN})$ in Autumn by applying the novel Garnering merge policy. While Garnering merge policy optimizes for both point reads and range reads, it maintains high performance for updates. Moreover, to further improve the update costs, Autumn uses a small amount of bounded space of DRAM to pin/keep the first level of LSM-tree. We implemented Autumn on top of LevelDB and experimentally showcases the gain in performance for real world workloads.

Autumn: A Scalable Read Optimized LSM-tree based Key-Value Stores with Fast Point and Range Read Speed

TL;DR

, resulting in a level count of

and worst-case read costs around

, i.e.,

, with or without Bloom filters. Autumn also employs a delayed last-level compaction strategy and DRAM pinning for Level 0 to curb write amplification, and it leverages Bloom-filter optimization to minimize point-read I/Os. Empirical results on LevelDB/RocksDB show substantial read-speed gains with modest or comparable write amplification, validating Autumn’s applicability to OLTP and HTAP workloads. The work provides a principled framework for balancing read and write costs in LSM-trees, supported by theoretical analyses and substantial benchmarking.

Abstract

complexity to

in Autumn by applying the novel Garnering merge policy. While Garnering merge policy optimizes for both point reads and range reads, it maintains high performance for updates. Moreover, to further improve the update costs, Autumn uses a small amount of bounded space of DRAM to pin/keep the first level of LSM-tree. We implemented Autumn on top of LevelDB and experimentally showcases the gain in performance for real world workloads.

Paper Structure (19 sections, 13 equations, 5 figures, 3 tables)

This paper contains 19 sections, 13 equations, 5 figures, 3 tables.

Introduction
LSM-tree Background
Concurrency Control and Recovery
LSM-tree Amplification
Merge Policies
Leveling and Tiering
Other Merge Policies
Autumn
A New Design
Further Improvements in Performance
Experiments
Implementation Details
Micro Benchmarks
Different operations with varying key-value sizes
Write and Read Sensitivity to $c$ and $T$
...and 4 more sections

Figures (5)

Figure 1: An illustration of the last three largest levels with QLSM-Bush, Tiering, Lazy-Leveling, Leveling, and our proposed Garnering. The left is more write optimized. The right is more read optimized.
Figure 2: The evaluation results between RocksDB and Autumn ($c=.8$) with six different operations using db_bench micro benchmark.
Figure 3: The effect of $c$ and $T$ on small range query.
Figure 4: YCSB Workloads Performance. This figure shows the performance comparison between Autumn and RocksDB on various YCSB core workloads. The x-axis presents the different workload names, and the y-axis shows the throughput using a base unit of a thousand ops per second (kops/sec). The actual throughput (kops/sec) achieved is placed on top of each bar.
Figure 5: Performance on db_bench Macro Benchmarks with Varying Database Size and Different Operations. The Performance is Measured in Average Latency.

Autumn: A Scalable Read Optimized LSM-tree based Key-Value Stores with Fast Point and Range Read Speed

TL;DR

Abstract

Autumn: A Scalable Read Optimized LSM-tree based Key-Value Stores with Fast Point and Range Read Speed

Authors

TL;DR

Abstract

Table of Contents

Figures (5)