Table of Contents
Fetching ...

On the Costs and Benefits of Learned Indexing for Dynamic High-Dimensional Data: Extended Version

Terézia Slanináková, Jaroslav Olha, David Procházka, Matej Antol, Vlastislav Dohnal

TL;DR

This paper tackles the challenge of adapting learned indexes to dynamically expanding, high-dimensional data. It proposes a lightweight dynamization approach that converts a static learned index into a dynamic one via node splitting (deepening) and broadening, paired with an amortized cost model to compare against static builds. The approach is demonstrated by applying dynamization to the Learned Metric Index and showing that total costs scale more favorably as the database grows. The work provides a general, practical pathway to extend static learned indexes to dynamic scenarios and a decision framework for when dynamization is advantageous.

Abstract

One of the main challenges within the growing research area of learned indexing is the lack of adaptability to dynamically expanding datasets. This paper explores the dynamization of a static learned index for complex data through operations such as node splitting and broadening, enabling efficient adaptation to new data. Furthermore, we evaluate the trade-offs between static and dynamic approaches by introducing an amortized cost model to assess query performance in tandem with the build costs of the index structure, enabling experimental determination of when a dynamic learned index outperforms its static counterpart. We apply the dynamization method to a static learned index and demonstrate that its superior scaling quickly surpasses the static implementation in terms of overall costs as the database grows. This is an extended version of the paper presented at DAWAK 2025.

On the Costs and Benefits of Learned Indexing for Dynamic High-Dimensional Data: Extended Version

TL;DR

This paper tackles the challenge of adapting learned indexes to dynamically expanding, high-dimensional data. It proposes a lightweight dynamization approach that converts a static learned index into a dynamic one via node splitting (deepening) and broadening, paired with an amortized cost model to compare against static builds. The approach is demonstrated by applying dynamization to the Learned Metric Index and showing that total costs scale more favorably as the database grows. The work provides a general, practical pathway to extend static learned indexes to dynamic scenarios and a decision framework for when dynamization is advantageous.

Abstract

One of the main challenges within the growing research area of learned indexing is the lack of adaptability to dynamically expanding datasets. This paper explores the dynamization of a static learned index for complex data through operations such as node splitting and broadening, enabling efficient adaptation to new data. Furthermore, we evaluate the trade-offs between static and dynamic approaches by introducing an amortized cost model to assess query performance in tandem with the build costs of the index structure, enabling experimental determination of when a dynamic learned index outperforms its static counterpart. We apply the dynamization method to a static learned index and demonstrate that its superior scaling quickly surpasses the static implementation in terms of overall costs as the database grows. This is an extended version of the paper presented at DAWAK 2025.

Paper Structure

This paper contains 18 sections, 1 equation, 8 figures, 2 tables, 3 algorithms.

Figures (8)

  • Figure 1: Overview of the deepening operation.
  • Figure 2: Overview of the broadening operation.
  • Figure 3: Overview of the shortening operation.
  • Figure 4: The amortized cost of a Naive rebuild baseline at different rebuild intervals (setup with 1 query per new object and a target recall of 0.5).
  • Figure 5: Amortized costs of the dynamized index and various baselines in the high intensity---high target recall scenario (100 queries per insert, target recall of 0.9).
  • ...and 3 more figures