Characterize LSM-tree Compaction Performance via On-Device LLM Inference
Jiabiao Ding, Yina Lv, Qiao Li, Zhirong Shen, Chun Jason Xue
TL;DR
This paper tackles real-time tuning of LSM-tree compaction via on-device LLMs and analyzes the trade-off between large cloud-scale models and small edge-scale models in terms of inference latency and tuning quality, applying the study to RocksDB v8.8.1 with $db extunderscore bench$ workloads. It provides a detailed examination of the LSM-tree architecture, the extensive and interdependent compaction parameter space, and the on-device LLM inference workflow, highlighting how memory, KV caching, and model size influence tuning decisions. The core findings show that while large models achieve higher tuning accuracy, their latency makes them impractical for real-time use, whereas small models are fast enough for edge deployment but exhibit limitations in reasoning, formatting, and stability; a reduced input size can enable effective device-side tuning. The work advocates for lightweight, domain-specific LLMs (potentially via PEFT) for edge storage optimization and sets the stage for future research on truly real-time, on-device LSM-tree optimization.
Abstract
Modern key-value storage engines built on Log-Structured Merge-trees (LSM-trees), such as RocksDB and LevelDB, rely heavily on the performance of their compaction operations, which are impacted by a complex set of interdependent configuration parameters. Manually tuning these parameters for optimal performance demands considerable expertise, while traditional auto-tuning approaches struggle with the enormous search space and low sample efficiency inherent to this domain. In recent years, Large Language Models (LLMs) have demonstrated strong capabilities in code generation and logical reasoning, offering new possibilities for system optimization. However, applying LLMs to real-time compaction tuning in such latency-sensitive environments is a double-edged sword. While large-scale LLMs can offer superior reasoning for strategy generation, their high inference latency and computational cost make them impractical for interactive, low-latency tuning. In contrast, small-scale LLMs achieve low latency but often at the expense of reasoning accuracy and tuning effectiveness. In this paper, we first evaluate this trade-off by analyzing the compaction-tuning performance and inference latency of LLMs at different scales in an LSM-tree-based tuning case. We then characterize the performance of LSM-tree on RocksDB v8.8.1, with a focus on adjusting the key compaction-related parameters under db_bench workloads. Our experimental results show a clear positive correlation between model capability and tuning effectiveness.
