HIRE: A Hybrid Learned Index for Robust and Efficient Performance under Mixed Workloads
Xinyi Zhang, Liang Liang, Anastasia Ailamaki, Jianliang Xu
TL;DR
HIRE addresses the instability and suboptimal range-query performance of prior updatable learned indexes under mixed workloads by combining a balanced tree with hybrid leaf nodes, model-accelerated internal nodes, and a non-blocking, cost-driven recalibration framework. The approach uses an inter-level bulk-loading algorithm to optimize multi-layer errors and a log-based update mechanism to reduce data movement during updates, all while maintaining high availability via RCU-based retraining. Empirical results show substantial gains: up to 41.7x higher throughput and up to 98% tail-latency reductions on mixed workloads, with robust performance across model-friendly and model-unfriendly data distributions. Practically, HIRE offers a scalable, robust in-memory indexing solution suitable for modern DBMS workloads requiring fast point/range queries and stable latency.
Abstract
Indexes are critical for efficient data retrieval and updates in modern databases. Recent advances in machine learning have led to the development of learned indexes, which model the cumulative distribution function of data to predict search positions and accelerate query processing. While learned indexes substantially outperform traditional structures for point lookups, they often suffer from high tail latency, suboptimal range query performance, and inconsistent effectiveness across diverse workloads. To address these challenges, this paper proposes HIRE, a hybrid in-memory index structure designed to deliver efficient performance consistently. HIRE combines the structural and performance robustness of traditional indexes with the predictive power of model-based prediction to reduce search overhead while maintaining worst-case stability. Specifically, it employs (1) hybrid leaf nodes adaptive to varying data distributions and workloads, (2) model-accelerated internal nodes augmented by log-based updates for efficient updates, (3) a nonblocking, cost-driven recalibration mechanism for dynamic data, and (4) an inter-level optimized bulk-loading algorithm accounting for leaf and internal-node errors. Experimental results on multiple real-world datasets demonstrate that HIRE outperforms both state-of-the-art learned indexes and traditional structures in range-query throughput, tail latency, and overall stability. Compared to state-of-the-art learned indexes and traditional indexes, HIRE achieves up to 41.7$\times$ higher throughput under mixed workloads, reduces tail latency by up to 98% across varying scenarios.
