Table of Contents
Fetching ...

UpLIF: An Updatable Self-Tuning Learned Index Framework

Alireza Heidari, Amirhossein Ahmadi, Wei Zhang

TL;DR

UpLIF tackles the challenge of updating learned indexes without frequent retraining by introducing a modular, self-tuning framework. It combines a linear-model adjustment $M'(k)=\Gamma(k)M(k)+r(k)$ with the Balanced Model Adjustment Tree (BMAT) and an Update Placeholder (Nullifier) to absorb updates while preserving index efficiency; an RL-based optimizer tunes BMAT type and retraining decisions. Empirical results show UpLIF delivers up to $3.12\times$ throughput improvements and up to $1000\times$ lower memory usage compared with baselines, and remains robust under distribution shifts and across large-scale datasets. The approach is generic to any sorted-key learned index, enabling practical, scalable dynamic indexing for modern workloads.

Abstract

The emergence of learned indexes has caused a paradigm shift in our perception of indexing by considering indexes as predictive models that estimate keys' positions within a data set, resulting in notable improvements in key search efficiency and index size reduction; however, a significant challenge inherent in learned index modeling is its constrained support for update operations, necessitated by the requirement for a fixed distribution of records. Previous studies have proposed various approaches to address this issue with the drawback of high overhead due to multiple model retraining. In this paper, we present UpLIF, an adaptive self-tuning learned index that adjusts the model to accommodate incoming updates, predicts the distribution of updates for performance improvement, and optimizes its index structure using reinforcement learning. We also introduce the concept of balanced model adjustment, which determines the model's inherent properties (i.e. bias and variance), enabling the integration of these factors into the existing index model without the need for retraining with new data. Our comprehensive experiments show that the system surpasses state-of-the-art indexing solutions (both traditional and ML-based), achieving an increase in throughput of up to 3.12 times with 1000 times less memory usage.

UpLIF: An Updatable Self-Tuning Learned Index Framework

TL;DR

UpLIF tackles the challenge of updating learned indexes without frequent retraining by introducing a modular, self-tuning framework. It combines a linear-model adjustment with the Balanced Model Adjustment Tree (BMAT) and an Update Placeholder (Nullifier) to absorb updates while preserving index efficiency; an RL-based optimizer tunes BMAT type and retraining decisions. Empirical results show UpLIF delivers up to throughput improvements and up to lower memory usage compared with baselines, and remains robust under distribution shifts and across large-scale datasets. The approach is generic to any sorted-key learned index, enabling practical, scalable dynamic indexing for modern workloads.

Abstract

The emergence of learned indexes has caused a paradigm shift in our perception of indexing by considering indexes as predictive models that estimate keys' positions within a data set, resulting in notable improvements in key search efficiency and index size reduction; however, a significant challenge inherent in learned index modeling is its constrained support for update operations, necessitated by the requirement for a fixed distribution of records. Previous studies have proposed various approaches to address this issue with the drawback of high overhead due to multiple model retraining. In this paper, we present UpLIF, an adaptive self-tuning learned index that adjusts the model to accommodate incoming updates, predicts the distribution of updates for performance improvement, and optimizes its index structure using reinforcement learning. We also introduce the concept of balanced model adjustment, which determines the model's inherent properties (i.e. bias and variance), enabling the integration of these factors into the existing index model without the need for retraining with new data. Our comprehensive experiments show that the system surpasses state-of-the-art indexing solutions (both traditional and ML-based), achieving an increase in throughput of up to 3.12 times with 1000 times less memory usage.
Paper Structure (19 sections, 9 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 19 sections, 9 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: (a) The approximator module demonstrates that trains the distribution $\mathcal{D}_{update}$ from the incoming updates $U$. (b) Overview of the UpLIF's modules. Module 1 predicts an approximate index based on the existing model $M_i(k)$. Module 2 then uses probabilistic methods to generate an offset to adjust the approximate index adjustments, denoted $\epsilon_U$, as well as an adjustment for the model error, denoted $\epsilon_M$. Module 3 combines the results of Module 1 and Module 2 with the original model $M_i(k)$ to create a new model for lookup queries for a given key $k$. Finally, Module 4 evaluates the system's performance and, utilizing updates, the current model $M$, variance $\Gamma$, and bias $r$, along with a Q-learning trained model, produces a new model $M'$ or updates to $\Gamma'$ and $r'$.
  • Figure 2: The learned index model $M_i$, characterized by a nonzero error $E$, maps four ranges over the key space. Incoming updates, treated as a random variable, affect this range in four distinct ways: $u_1$ uniformly shifts all elements without changing the range size; $u_2$ expands the range on the left by shifting $M_i(k)$ rightward; $u_3$ enlarges the range on the right; and $u_4$ neither alters the error range size nor moves any internal components.
  • Figure 3: UpLIF overview on update and lookup operations. (1) Initially, UpLIF creates a placeholder structure in the key domain and constructs an index model on top of it. (2) An update with $key=5$ arrives, and it can be placed in the empty placeholder. (3) An update with $key=7$ arrives that makes conflict with $key=6$ using the current model. UpLIF divides the key domain into three segments and adjusts the previous model for each segment without retraining. It also adds placeholders with $\alpha=2$ in the middle segment (Section \ref{['sec:nullifier']}). Finally, two nodes are added to BMAT that can be a red-black tree (left) or a B+Tree (right).
  • Figure 4: Comparison between various BMAT types. The plots show the performance and memory consumption of RBMAT normalized to B+MAT. Lower is better for Memory and higher is better for performance.
  • Figure 5: RL Agent Overview in four steps. It (1) retrieves the current state (BMAT structure) from the Approximator module, (2) executes the best action which can be (i) keeping the current BMAT structure, (ii) retraining on the subset of data, or (iii) migrating to another BMAT type. (3, 4) The RL agent finally measures the performance and memory consumption of the executed action and updates the Q-Values in the Q-Table based on the calculated reward.
  • ...and 1 more figures

Theorems & Definitions (4)

  • definition thmcounterdefinition
  • definition thmcounterdefinition
  • definition thmcounterdefinition: Scalier
  • definition thmcounterdefinition: Nullify