On the Costs and Benefits of Learned Indexing for Dynamic High-Dimensional Data: Extended Version
Terézia Slanináková, Jaroslav Olha, David Procházka, Matej Antol, Vlastislav Dohnal
TL;DR
This paper tackles the challenge of adapting learned indexes to dynamically expanding, high-dimensional data. It proposes a lightweight dynamization approach that converts a static learned index into a dynamic one via node splitting (deepening) and broadening, paired with an amortized cost model to compare against static builds. The approach is demonstrated by applying dynamization to the Learned Metric Index and showing that total costs scale more favorably as the database grows. The work provides a general, practical pathway to extend static learned indexes to dynamic scenarios and a decision framework for when dynamization is advantageous.
Abstract
One of the main challenges within the growing research area of learned indexing is the lack of adaptability to dynamically expanding datasets. This paper explores the dynamization of a static learned index for complex data through operations such as node splitting and broadening, enabling efficient adaptation to new data. Furthermore, we evaluate the trade-offs between static and dynamic approaches by introducing an amortized cost model to assess query performance in tandem with the build costs of the index structure, enabling experimental determination of when a dynamic learned index outperforms its static counterpart. We apply the dynamization method to a static learned index and demonstrate that its superior scaling quickly surpasses the static implementation in terms of overall costs as the database grows. This is an extended version of the paper presented at DAWAK 2025.
