Table of Contents
Fetching ...

On the Power of Learning-Augmented Search Trees

Jingbang Chen, Xinyuan Cao, Alicia Stepin, Li Chen

TL;DR

The paper tackles designing learning-augmented search structures that adapt to arbitrary input distributions and temporal access patterns. It introduces composite Treap priorities, blending randomized and learned components to achieve $\mathbb{E}[\mathrm{depth}(x)] = O\big(\log_2(1/w_x)\big)$ and, with $w_x$ as item frequencies, static optimality; it extends these ideas to B-Trees via B-Treaps and to dynamic settings that realize the working-set bound. The work also provides robustness guarantees, showing prediction errors contribute additively through cross-entropy/KL-divergence measures, and demonstrates dynamic adaptability with changing priorities. Empirically, the proposed structures outperform traditional BSTs and prior learned-index approaches across Zipfian, adversarial, and uniform distributions, including scenarios with imperfect predictions, highlighting practical impact for adaptive indexing and external-memory data structures.

Abstract

We study learning-augmented binary search trees (BSTs) via Treaps with carefully designed priorities. The result is a simple search tree in which the depth of each item $x$ is determined by its predicted weight $w_x$. Specifically, each item $x$ is assigned a composite priority of $-\lfloor\log\log(1/w_x)\rfloor + U(0, 1)$ where $U(0, 1)$ is the uniform random variable. By choosing $w_x$ as the relative frequency of $x$, the resulting search trees achieve static optimality. This approach generalizes the recent learning-augmented BSTs [Lin-Luo-Woodruff ICML '22], which only work for Zipfian distributions, by extending them to arbitrary input distributions. Furthermore, we demonstrate that our method can be generalized to a B-Tree data structure using the B-Treap approach [Golovin ICALP '09]. Our search trees are also capable of leveraging localities in the access sequence through online self-reorganization, thereby achieving the working-set property. Additionally, they are robust to prediction errors and support dynamic operations, such as insertions, deletions, and prediction updates. We complement our analysis with an empirical study, demonstrating that our method outperforms prior work and classic data structures.

On the Power of Learning-Augmented Search Trees

TL;DR

The paper tackles designing learning-augmented search structures that adapt to arbitrary input distributions and temporal access patterns. It introduces composite Treap priorities, blending randomized and learned components to achieve and, with as item frequencies, static optimality; it extends these ideas to B-Trees via B-Treaps and to dynamic settings that realize the working-set bound. The work also provides robustness guarantees, showing prediction errors contribute additively through cross-entropy/KL-divergence measures, and demonstrates dynamic adaptability with changing priorities. Empirically, the proposed structures outperform traditional BSTs and prior learned-index approaches across Zipfian, adversarial, and uniform distributions, including scenarios with imperfect predictions, highlighting practical impact for adaptive indexing and external-memory data structures.

Abstract

We study learning-augmented binary search trees (BSTs) via Treaps with carefully designed priorities. The result is a simple search tree in which the depth of each item is determined by its predicted weight . Specifically, each item is assigned a composite priority of where is the uniform random variable. By choosing as the relative frequency of , the resulting search trees achieve static optimality. This approach generalizes the recent learning-augmented BSTs [Lin-Luo-Woodruff ICML '22], which only work for Zipfian distributions, by extending them to arbitrary input distributions. Furthermore, we demonstrate that our method can be generalized to a B-Tree data structure using the B-Treap approach [Golovin ICALP '09]. Our search trees are also capable of leveraging localities in the access sequence through online self-reorganization, thereby achieving the working-set property. Additionally, they are robust to prediction errors and support dynamic operations, such as insertions, deletions, and prediction updates. We complement our analysis with an empirical study, demonstrating that our method outperforms prior work and classic data structures.
Paper Structure (42 sections, 32 theorems, 70 equations, 8 figures)

This paper contains 42 sections, 32 theorems, 70 equations, 8 figures.

Key Result

Lemma 2.1

Let $U(0, 1)$ be the uniform distribution over the real interval $[0, 1].$ If ${\mathsf{priority}} \sim U(0, 1)^n$, each Treap node $x$ has depth $\Theta(\log_2 n)$ with high probability.

Figures (8)

  • Figure 1: Sketch for static and dynamic learning augmented search trees. Since item 3 has a higher frequency around time $i$, dynamic search trees adjust the priority accordingly.
  • Figure 2: Zipfian distribution, $\alpha=1$.
  • Figure 3: Adversarial distribution.
  • Figure 4: Uniform distribution.
  • Figure 5: Inaccurate Prediction Oracle.
  • ...and 3 more figures

Theorems & Definitions (64)

  • Definition 2.1: Treap AS89:treap
  • Lemma 2.1: AS89:treap
  • Lemma 2.2: AS89:treap
  • Theorem 2.3: Learning-Augmented Treap via Composite Priorities
  • Lemma 2.4
  • proof
  • Lemma 2.5
  • proof
  • proof : Proof of \ref{['thm:compPriTreap']}
  • Corollary 2.6
  • ...and 54 more