Table of Contents
Fetching ...

HIRE: A Hybrid Learned Index for Robust and Efficient Performance under Mixed Workloads

Xinyi Zhang, Liang Liang, Anastasia Ailamaki, Jianliang Xu

TL;DR

HIRE addresses the instability and suboptimal range-query performance of prior updatable learned indexes under mixed workloads by combining a balanced tree with hybrid leaf nodes, model-accelerated internal nodes, and a non-blocking, cost-driven recalibration framework. The approach uses an inter-level bulk-loading algorithm to optimize multi-layer errors and a log-based update mechanism to reduce data movement during updates, all while maintaining high availability via RCU-based retraining. Empirical results show substantial gains: up to 41.7x higher throughput and up to 98% tail-latency reductions on mixed workloads, with robust performance across model-friendly and model-unfriendly data distributions. Practically, HIRE offers a scalable, robust in-memory indexing solution suitable for modern DBMS workloads requiring fast point/range queries and stable latency.

Abstract

Indexes are critical for efficient data retrieval and updates in modern databases. Recent advances in machine learning have led to the development of learned indexes, which model the cumulative distribution function of data to predict search positions and accelerate query processing. While learned indexes substantially outperform traditional structures for point lookups, they often suffer from high tail latency, suboptimal range query performance, and inconsistent effectiveness across diverse workloads. To address these challenges, this paper proposes HIRE, a hybrid in-memory index structure designed to deliver efficient performance consistently. HIRE combines the structural and performance robustness of traditional indexes with the predictive power of model-based prediction to reduce search overhead while maintaining worst-case stability. Specifically, it employs (1) hybrid leaf nodes adaptive to varying data distributions and workloads, (2) model-accelerated internal nodes augmented by log-based updates for efficient updates, (3) a nonblocking, cost-driven recalibration mechanism for dynamic data, and (4) an inter-level optimized bulk-loading algorithm accounting for leaf and internal-node errors. Experimental results on multiple real-world datasets demonstrate that HIRE outperforms both state-of-the-art learned indexes and traditional structures in range-query throughput, tail latency, and overall stability. Compared to state-of-the-art learned indexes and traditional indexes, HIRE achieves up to 41.7$\times$ higher throughput under mixed workloads, reduces tail latency by up to 98% across varying scenarios.

HIRE: A Hybrid Learned Index for Robust and Efficient Performance under Mixed Workloads

TL;DR

HIRE addresses the instability and suboptimal range-query performance of prior updatable learned indexes under mixed workloads by combining a balanced tree with hybrid leaf nodes, model-accelerated internal nodes, and a non-blocking, cost-driven recalibration framework. The approach uses an inter-level bulk-loading algorithm to optimize multi-layer errors and a log-based update mechanism to reduce data movement during updates, all while maintaining high availability via RCU-based retraining. Empirical results show substantial gains: up to 41.7x higher throughput and up to 98% tail-latency reductions on mixed workloads, with robust performance across model-friendly and model-unfriendly data distributions. Practically, HIRE offers a scalable, robust in-memory indexing solution suitable for modern DBMS workloads requiring fast point/range queries and stable latency.

Abstract

Indexes are critical for efficient data retrieval and updates in modern databases. Recent advances in machine learning have led to the development of learned indexes, which model the cumulative distribution function of data to predict search positions and accelerate query processing. While learned indexes substantially outperform traditional structures for point lookups, they often suffer from high tail latency, suboptimal range query performance, and inconsistent effectiveness across diverse workloads. To address these challenges, this paper proposes HIRE, a hybrid in-memory index structure designed to deliver efficient performance consistently. HIRE combines the structural and performance robustness of traditional indexes with the predictive power of model-based prediction to reduce search overhead while maintaining worst-case stability. Specifically, it employs (1) hybrid leaf nodes adaptive to varying data distributions and workloads, (2) model-accelerated internal nodes augmented by log-based updates for efficient updates, (3) a nonblocking, cost-driven recalibration mechanism for dynamic data, and (4) an inter-level optimized bulk-loading algorithm accounting for leaf and internal-node errors. Experimental results on multiple real-world datasets demonstrate that HIRE outperforms both state-of-the-art learned indexes and traditional structures in range-query throughput, tail latency, and overall stability. Compared to state-of-the-art learned indexes and traditional indexes, HIRE achieves up to 41.7 higher throughput under mixed workloads, reduces tail latency by up to 98% across varying scenarios.

Paper Structure

This paper contains 36 sections, 5 equations, 16 figures, 1 table, 3 algorithms.

Figures (16)

  • Figure 1: Illustration of performance limitations of existing learned indexes under a balanced mixed workload (Query:Insert:Delete = 1:1:1). (a): Distributions of two SOSD datasets. (b): Performance comparisons of different indexes. (c): Latencies of insertion, deletion, and range query operations on the OSM dataset.
  • Figure 2: Structure of HIRE
  • Figure 3: Search of HIRE
  • Figure 4: Updates of HIRE
  • Figure 5: Retraining of HIRE
  • ...and 11 more figures