Table of Contents
Fetching ...

How to Grow an LSM-tree? Towards Bridging the Gap Between Theory and Practice

Dingheng Mo, Siqiang Luo, Stratos Idreos

TL;DR

This work reexamines two canonical LSM-tree growth schemes—vertical and horizontal—and identifies gaps between theory and practice. It introduces Horizontal-Tiering, the first horizontal scheme compatible with tiering, and proves its optimality for minimizing read cost under fixed levels. Building on this, Vertiorizon combines a top horizontal part with a bottom vertical part to balance reads, writes, and space, and includes self-tuning to adapt to diverse workloads, including skewed distributions. Empirical evaluation in RocksDB shows Vertiorizon achieving substantial throughput gains (up to about 3.2x) and markedly reduced space amplification compared with horizontal schemes, while maintaining robust worst-case performance. The proposed approach provides a practical, adaptive backbone for LSM-tree implementations and offers a wide Pareto frontier for read/write/space trade-offs.

Abstract

LSM-tree based key-value stores are widely adopted as the data storage backend in modern big data applications. The LSM-tree grows with data ingestion, by either adding levels with fixed level capacities (dubbed as vertical scheme) or increasing level capacities with fixed number of levels (dubbed as horizontal scheme). The vertical scheme leads the trend in recent system designs in RocksDB, LevelDB, and WiredTiger, whereas the horizontal scheme shows a decline in being adopted in the industry. The growth scheme profoundly impacts the LSM system performance in various aspects such as read, write and space costs. This paper attempts to give a new insight into a fundamental design question -- how to grow an LSM-tree to attain more desirable performance? Our analysis highlights the limitations of the vertical scheme in achieving an optimal read-write trade-off and the horizontal scheme in managing space cost effectively. Building on the analysis, we present a novel approach, Vertiorizon, which combines the strengths of both the vertical and horizontal schemes to achieve a superior balance between lookup, update, and space costs. Its adaptive design makes it highly compatible with a wide spectrum of workloads. Compared to the vertical scheme, Vertiorizon significantly improves the read-write performance trade-off. In contrast to the horizontal scheme, Vertiorizon greatly extends the trade-off range by a non-trivial generalization of Bentley and Saxe's theory, while substantially reducing space costs. When integrated with RocksDB, Vertiorizon demonstrates better write performance than the vertical scheme, while incurring about six times less additional space cost compared to the horizontal scheme.

How to Grow an LSM-tree? Towards Bridging the Gap Between Theory and Practice

TL;DR

This work reexamines two canonical LSM-tree growth schemes—vertical and horizontal—and identifies gaps between theory and practice. It introduces Horizontal-Tiering, the first horizontal scheme compatible with tiering, and proves its optimality for minimizing read cost under fixed levels. Building on this, Vertiorizon combines a top horizontal part with a bottom vertical part to balance reads, writes, and space, and includes self-tuning to adapt to diverse workloads, including skewed distributions. Empirical evaluation in RocksDB shows Vertiorizon achieving substantial throughput gains (up to about 3.2x) and markedly reduced space amplification compared with horizontal schemes, while maintaining robust worst-case performance. The proposed approach provides a practical, adaptive backbone for LSM-tree implementations and offers a wide Pareto frontier for read/write/space trade-offs.

Abstract

LSM-tree based key-value stores are widely adopted as the data storage backend in modern big data applications. The LSM-tree grows with data ingestion, by either adding levels with fixed level capacities (dubbed as vertical scheme) or increasing level capacities with fixed number of levels (dubbed as horizontal scheme). The vertical scheme leads the trend in recent system designs in RocksDB, LevelDB, and WiredTiger, whereas the horizontal scheme shows a decline in being adopted in the industry. The growth scheme profoundly impacts the LSM system performance in various aspects such as read, write and space costs. This paper attempts to give a new insight into a fundamental design question -- how to grow an LSM-tree to attain more desirable performance? Our analysis highlights the limitations of the vertical scheme in achieving an optimal read-write trade-off and the horizontal scheme in managing space cost effectively. Building on the analysis, we present a novel approach, Vertiorizon, which combines the strengths of both the vertical and horizontal schemes to achieve a superior balance between lookup, update, and space costs. Its adaptive design makes it highly compatible with a wide spectrum of workloads. Compared to the vertical scheme, Vertiorizon significantly improves the read-write performance trade-off. In contrast to the horizontal scheme, Vertiorizon greatly extends the trade-off range by a non-trivial generalization of Bentley and Saxe's theory, while substantially reducing space costs. When integrated with RocksDB, Vertiorizon demonstrates better write performance than the vertical scheme, while incurring about six times less additional space cost compared to the horizontal scheme.

Paper Structure

This paper contains 21 sections, 8 theorems, 12 equations, 10 figures, 3 tables, 2 algorithms.

Key Result

lemma 1

Under the horizontal-tiering scheme, if we initially set the compaction counters of all levels to $k$, then after ${{k + \ell - 1} \choose {\ell}}$ buffer flushes, the compaction counters of all levels will decrease to zero.

Figures (10)

  • Figure 1: Different growth schemes of LSM-trees.
  • Figure 2: Running examples of the vertical and horizontal growth schemes.
  • Figure 3: Illustrating the cost of different schemes.
  • Figure 3: Rankings of each method across different metrics for each case in Fig. \ref{['fig:large memory exp']} (rounded to nearest tenth).
  • Figure 4: Different compactions lead to varying read costs.
  • ...and 5 more figures

Theorems & Definitions (8)

  • lemma 1
  • theorem 1
  • lemma 2
  • lemma 3
  • lemma 4
  • lemma 5
  • lemma 6
  • lemma 7