CAMAL: Optimizing LSM-trees via Active Learning
Weiping Yu, Siqiang Luo, Zihao Yu, Gao Cong
TL;DR
Camal tackles the problem of tuning LSM-tree parameters for variable read/write workloads by fusing a complexity-based cost model with active learning, enabling decoupled parameter exploration, data-growth extrapolation, and online adaptation. The approach introduces a decoupled active-learning framework, an extrapolation strategy to avoid retraining, and a dynamic mode (with a lazy transition LSM-tree) to handle workload shifts, all integrated into RocksDB. Key contributions include the first application of active learning to LSM-tree instance optimization, a hierarchical, decoupled sampling scheme, efficient extrapolation across data growth, and a dynamic tuning mechanism, supported by empirical results showing substantial latency reductions and significant training-time savings. The work demonstrates practical impact by delivering near-optimal configurations with fewer samples, enabling responsive tuning for real-world storage systems under diverse and evolving workloads.
Abstract
We use machine learning to optimize LSM-tree structure, aiming to reduce the cost of processing various read/write operations. We introduce a new approach Camal, which boasts the following features: (1) ML-Aided: Camal is the first attempt to apply active learning to tune LSM-tree based key-value stores. The learning process is coupled with traditional cost models to improve the training process; (2) Decoupled Active Learning: backed by rigorous analysis, Camal adopts active learning paradigm based on a decoupled tuning of each parameter, which further accelerates the learning process; (3) Easy Extrapolation: Camal adopts an effective mechanism to incrementally update the model with the growth of the data size; (4) Dynamic Mode: Camal is able to tune LSM-tree online under dynamically changing workloads; (5) Significant System Improvement: By integrating Camal into a full system RocksDB, the system performance improves by 28% on average and up to 8x compared to a state-of-the-art RocksDB design.
