Characterize LSM-tree Compaction Performance via On-Device LLM Inference

Jiabiao Ding; Yina Lv; Qiao Li; Zhirong Shen; Chun Jason Xue

Characterize LSM-tree Compaction Performance via On-Device LLM Inference

Jiabiao Ding, Yina Lv, Qiao Li, Zhirong Shen, Chun Jason Xue

TL;DR

This paper tackles real-time tuning of LSM-tree compaction via on-device LLMs and analyzes the trade-off between large cloud-scale models and small edge-scale models in terms of inference latency and tuning quality, applying the study to RocksDB v8.8.1 with $db extunderscore bench$ workloads. It provides a detailed examination of the LSM-tree architecture, the extensive and interdependent compaction parameter space, and the on-device LLM inference workflow, highlighting how memory, KV caching, and model size influence tuning decisions. The core findings show that while large models achieve higher tuning accuracy, their latency makes them impractical for real-time use, whereas small models are fast enough for edge deployment but exhibit limitations in reasoning, formatting, and stability; a reduced input size can enable effective device-side tuning. The work advocates for lightweight, domain-specific LLMs (potentially via PEFT) for edge storage optimization and sets the stage for future research on truly real-time, on-device LSM-tree optimization.

Abstract

Modern key-value storage engines built on Log-Structured Merge-trees (LSM-trees), such as RocksDB and LevelDB, rely heavily on the performance of their compaction operations, which are impacted by a complex set of interdependent configuration parameters. Manually tuning these parameters for optimal performance demands considerable expertise, while traditional auto-tuning approaches struggle with the enormous search space and low sample efficiency inherent to this domain. In recent years, Large Language Models (LLMs) have demonstrated strong capabilities in code generation and logical reasoning, offering new possibilities for system optimization. However, applying LLMs to real-time compaction tuning in such latency-sensitive environments is a double-edged sword. While large-scale LLMs can offer superior reasoning for strategy generation, their high inference latency and computational cost make them impractical for interactive, low-latency tuning. In contrast, small-scale LLMs achieve low latency but often at the expense of reasoning accuracy and tuning effectiveness. In this paper, we first evaluate this trade-off by analyzing the compaction-tuning performance and inference latency of LLMs at different scales in an LSM-tree-based tuning case. We then characterize the performance of LSM-tree on RocksDB v8.8.1, with a focus on adjusting the key compaction-related parameters under db_bench workloads. Our experimental results show a clear positive correlation between model capability and tuning effectiveness.

Characterize LSM-tree Compaction Performance via On-Device LLM Inference

TL;DR

workloads. It provides a detailed examination of the LSM-tree architecture, the extensive and interdependent compaction parameter space, and the on-device LLM inference workflow, highlighting how memory, KV caching, and model size influence tuning decisions. The core findings show that while large models achieve higher tuning accuracy, their latency makes them impractical for real-time use, whereas small models are fast enough for edge deployment but exhibit limitations in reasoning, formatting, and stability; a reduced input size can enable effective device-side tuning. The work advocates for lightweight, domain-specific LLMs (potentially via PEFT) for edge storage optimization and sets the stage for future research on truly real-time, on-device LSM-tree optimization.

Abstract

Paper Structure (8 sections, 5 figures, 1 table)

This paper contains 8 sections, 5 figures, 1 table.

Introduction
Background
Architecture of LSM-tree
Configurable Parameters
LLM Inference
Motivation
Related Work
Conclusion

Figures (5)

Figure 1: Architecture of a typical LSM-tree (e.g., RocksDB). Host writes are buffered in the MemTable, and marked as Immutable MemTable when full, and then flushed to immutable SSTables in $L_0$, and gradually compacted to larger, sorted SSTables in lower levels (i.e., $L_1$-$L_n$). Flush and compaction are managed by background threads.
Figure 2: Workflow of LLM inference.
Figure 3: Inference latency during each iteration. Large-scale LLM (i.e., DeepSeek-V3) tuning incurs high latency per iteration, making it unable to react to fast workload changes. (Finding #1)
Figure 4: Inference failures of 1B-parameter LLMs in LSM-tree tuning. We categorize failures into four cases: (a) Instruction format error, where the LLM output violates formatting constraints; (b) Inference interrupted due to context limits or false stop tokens; (c) Repeated output, characterized by infinite generative loops; and (d) Unstable exploration-stability trade-offs, showing the difficulty of finding an effective temperature window that balances format compliance and parameter diversity. (Take Pangu-1B arXiv2023pangupi and Pangu-7B arXiv2025pangu7B as an example. (Finding #2)
Figure 5: Performance during each iteration. Large-scale LLM (i.e., DeepSeek-V3) inference results in higher inference latency than that of small-scale LLM (i.e., Pangu-7B). (Finding #3)

Characterize LSM-tree Compaction Performance via On-Device LLM Inference

TL;DR

Abstract

Characterize LSM-tree Compaction Performance via On-Device LLM Inference

Authors

TL;DR

Abstract

Table of Contents

Figures (5)