UpLIF: An Updatable Self-Tuning Learned Index Framework
Alireza Heidari, Amirhossein Ahmadi, Wei Zhang
TL;DR
UpLIF tackles the challenge of updating learned indexes without frequent retraining by introducing a modular, self-tuning framework. It combines a linear-model adjustment $M'(k)=\Gamma(k)M(k)+r(k)$ with the Balanced Model Adjustment Tree (BMAT) and an Update Placeholder (Nullifier) to absorb updates while preserving index efficiency; an RL-based optimizer tunes BMAT type and retraining decisions. Empirical results show UpLIF delivers up to $3.12\times$ throughput improvements and up to $1000\times$ lower memory usage compared with baselines, and remains robust under distribution shifts and across large-scale datasets. The approach is generic to any sorted-key learned index, enabling practical, scalable dynamic indexing for modern workloads.
Abstract
The emergence of learned indexes has caused a paradigm shift in our perception of indexing by considering indexes as predictive models that estimate keys' positions within a data set, resulting in notable improvements in key search efficiency and index size reduction; however, a significant challenge inherent in learned index modeling is its constrained support for update operations, necessitated by the requirement for a fixed distribution of records. Previous studies have proposed various approaches to address this issue with the drawback of high overhead due to multiple model retraining. In this paper, we present UpLIF, an adaptive self-tuning learned index that adjusts the model to accommodate incoming updates, predicts the distribution of updates for performance improvement, and optimizes its index structure using reinforcement learning. We also introduce the concept of balanced model adjustment, which determines the model's inherent properties (i.e. bias and variance), enabling the integration of these factors into the existing index model without the need for retraining with new data. Our comprehensive experiments show that the system surpasses state-of-the-art indexing solutions (both traditional and ML-based), achieving an increase in throughput of up to 3.12 times with 1000 times less memory usage.
