StorageXTuner: An LLM Agent-Driven Automatic Tuning Framework for Heterogeneous Storage Systems
Qi Lin, Zhenyu Zhang, Viraj Thakkar, Zhenjie Sun, Mai Zheng, Zhichao Cao
TL;DR
StorageXTuner tackles the challenge of automatically tuning heterogeneous storage systems by deploying a collaborative four-agent LLM framework (Executor, Extractor, Searcher, Reflector) that splits benchmarking, data extraction, configuration search, and insight management into modular tasks. It combines an insight-driven tree search with a layered memory system to reuse validated tuning knowledge while guarding against unsafe actions, achieving robust generalization across systems and versions. The authors implement a Python prototype and demonstrate substantial performance gains across RocksDB, LevelDB, CacheLib, and InnoDB, including up to 575% throughput improvements and up to 88% p99 latency reductions, along with ablation and sensitivity analyses that highlight the value of context, insights, and closed-loop validation. By introducing new evaluation metrics and a reusable, multi-agent architecture, StorageXTuner provides a practical, scalable approach to LL-driven storage tuning with broad applicability beyond a single system or workload.
Abstract
Automatically configuring storage systems is hard: parameter spaces are large and conditions vary across workloads, deployments, and versions. Heuristic and ML tuners are often system specific, require manual glue, and degrade under changes. Recent LLM-based approaches help but usually treat tuning as a single-shot, system-specific task, which limits cross-system reuse, constrains exploration, and weakens validation. We present StorageXTuner, an LLM agent-driven auto-tuning framework for heterogeneous storage engines. StorageXTuner separates concerns across four agents - Executor (sandboxed benchmarking), Extractor (performance digest), Searcher (insight-guided configuration exploration), and Reflector (insight generation and management). The design couples an insight-driven tree search with layered memory that promotes empirically validated insights and employs lightweight checkers to guard against unsafe actions. We implement a prototype and evaluate it on RocksDB, LevelDB, CacheLib, and MySQL InnoDB with YCSB, MixGraph, and TPC-H/C. Relative to out-of-the-box settings and to ELMo-Tune, StorageXTuner reaches up to 575% and 111% higher throughput, reduces p99 latency by as much as 88% and 56%, and converges with fewer trials.
