How Good Are Multi-dimensional Learned Indices? An Experimental Survey
Qiyu Liu, Maocheng Li, Yuxiang Zeng, Yanyan Shen, Lei Chen
TL;DR
The paper addresses the lack of a unified, rigorous evaluation of multi-dimensional learned indices. It classifies existing approaches into projection-, augmentation-, and grid-based families, and implements six representative indices under a single experimental framework with real and synthetic datasets. The study finds that projection- and grid-based learned indices can substantially reduce index size and accelerate range queries, but kNN queries and dynamic updates remain challenging, with no single method beating traditional spatial indices in all scenarios. These findings guide future design toward improved updateability, broader query support, and hardware-conscious optimizations for practical deployment.
Abstract
Efficient indexing is fundamental for multi-dimensional data management and analytics. An emerging tendency is to directly learn the storage layout of multi-dimensional data by simple machine learning models, yielding the concept of Learned Index. Compared with the conventional indices used for decades (e.g., kd-tree and R-tree variants), learned indices are empirically shown to be both space- and time-efficient on modern architectures. However, there lacks a comprehensive evaluation of existing multi-dimensional learned indices under a unified benchmark, which makes it difficult to decide the suitable index for specific data and queries and further prevents the deployment of learned indices in real application scenarios. In this paper, we present the first in-depth empirical study to answer the question of how good multi-dimensional learned indices are. Six recently published indices are evaluated under a unified experimental configuration including index implementation, datasets, query workloads, and evaluation metrics. We thoroughly investigate the evaluation results and discuss the findings that may provide insights for future learned index design.
