On the Power of Learning-Augmented Search Trees

Jingbang Chen; Xinyuan Cao; Alicia Stepin; Li Chen

On the Power of Learning-Augmented Search Trees

Jingbang Chen, Xinyuan Cao, Alicia Stepin, Li Chen

TL;DR

The paper tackles designing learning-augmented search structures that adapt to arbitrary input distributions and temporal access patterns. It introduces composite Treap priorities, blending randomized and learned components to achieve $\mathbb{E}[\mathrm{depth}(x)] = O\big(\log_2(1/w_x)\big)$ and, with $w_x$ as item frequencies, static optimality; it extends these ideas to B-Trees via B-Treaps and to dynamic settings that realize the working-set bound. The work also provides robustness guarantees, showing prediction errors contribute additively through cross-entropy/KL-divergence measures, and demonstrates dynamic adaptability with changing priorities. Empirically, the proposed structures outperform traditional BSTs and prior learned-index approaches across Zipfian, adversarial, and uniform distributions, including scenarios with imperfect predictions, highlighting practical impact for adaptive indexing and external-memory data structures.

Abstract

We study learning-augmented binary search trees (BSTs) via Treaps with carefully designed priorities. The result is a simple search tree in which the depth of each item $x$ is determined by its predicted weight $w_x$. Specifically, each item $x$ is assigned a composite priority of $-\lfloor\log\log(1/w_x)\rfloor + U(0, 1)$ where $U(0, 1)$ is the uniform random variable. By choosing $w_x$ as the relative frequency of $x$, the resulting search trees achieve static optimality. This approach generalizes the recent learning-augmented BSTs [Lin-Luo-Woodruff ICML '22], which only work for Zipfian distributions, by extending them to arbitrary input distributions. Furthermore, we demonstrate that our method can be generalized to a B-Tree data structure using the B-Treap approach [Golovin ICALP '09]. Our search trees are also capable of leveraging localities in the access sequence through online self-reorganization, thereby achieving the working-set property. Additionally, they are robust to prediction errors and support dynamic operations, such as insertions, deletions, and prediction updates. We complement our analysis with an empirical study, demonstrating that our method outperforms prior work and classic data structures.

On the Power of Learning-Augmented Search Trees

TL;DR

and, with

as item frequencies, static optimality; it extends these ideas to B-Trees via B-Treaps and to dynamic settings that realize the working-set bound. The work also provides robustness guarantees, showing prediction errors contribute additively through cross-entropy/KL-divergence measures, and demonstrates dynamic adaptability with changing priorities. Empirically, the proposed structures outperform traditional BSTs and prior learned-index approaches across Zipfian, adversarial, and uniform distributions, including scenarios with imperfect predictions, highlighting practical impact for adaptive indexing and external-memory data structures.

Abstract

We study learning-augmented binary search trees (BSTs) via Treaps with carefully designed priorities. The result is a simple search tree in which the depth of each item

is determined by its predicted weight

. Specifically, each item

is assigned a composite priority of

where

is the uniform random variable. By choosing

as the relative frequency of

, the resulting search trees achieve static optimality. This approach generalizes the recent learning-augmented BSTs [Lin-Luo-Woodruff ICML '22], which only work for Zipfian distributions, by extending them to arbitrary input distributions. Furthermore, we demonstrate that our method can be generalized to a B-Tree data structure using the B-Treap approach [Golovin ICALP '09]. Our search trees are also capable of leveraging localities in the access sequence through online self-reorganization, thereby achieving the working-set property. Additionally, they are robust to prediction errors and support dynamic operations, such as insertions, deletions, and prediction updates. We complement our analysis with an empirical study, demonstrating that our method outperforms prior work and classic data structures.

Paper Structure (42 sections, 32 theorems, 70 equations, 8 figures)

This paper contains 42 sections, 32 theorems, 70 equations, 8 figures.

Introduction
Overview
Learning-Augmented Treaps via Composite Priority Functions.
Static Optimality of Learning-Augmented Search Trees.
Dynamic Learning-Augmented Search Trees.
Robustness to Prediction Inaccuracy.
Related Work
Learning-Augmented Binary Search Trees
Learning-Augmented Treaps
Proof Plan.
Static Optimality
Robustness Guarantees
Analysis of Other Priority Assignments
Learning-Augmented B-Trees
Dynamic Learning-Augmented Search Trees
...and 27 more sections

Key Result

Lemma 2.1

Let $U(0, 1)$ be the uniform distribution over the real interval $[0, 1].$ If ${\mathsf{priority}} \sim U(0, 1)^n$, each Treap node $x$ has depth $\Theta(\log_2 n)$ with high probability.

Figures (8)

Figure 1: Sketch for static and dynamic learning augmented search trees. Since item 3 has a higher frequency around time $i$, dynamic search trees adjust the priority accordingly.
Figure 2: Zipfian distribution, $\alpha=1$.
Figure 3: Adversarial distribution.
Figure 4: Uniform distribution.
Figure 5: Inaccurate Prediction Oracle.
...and 3 more figures

Theorems & Definitions (64)

Definition 2.1: Treap AS89:treap
Lemma 2.1: AS89:treap
Lemma 2.2: AS89:treap
Theorem 2.3: Learning-Augmented Treap via Composite Priorities
Lemma 2.4
proof
Lemma 2.5
proof
proof : Proof of \ref{['thm:compPriTreap']}
Corollary 2.6
...and 54 more

On the Power of Learning-Augmented Search Trees

TL;DR

Abstract

On the Power of Learning-Augmented Search Trees

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (64)