On the Power of Learning-Augmented Search Trees
Jingbang Chen, Xinyuan Cao, Alicia Stepin, Li Chen
TL;DR
The paper tackles designing learning-augmented search structures that adapt to arbitrary input distributions and temporal access patterns. It introduces composite Treap priorities, blending randomized and learned components to achieve $\mathbb{E}[\mathrm{depth}(x)] = O\big(\log_2(1/w_x)\big)$ and, with $w_x$ as item frequencies, static optimality; it extends these ideas to B-Trees via B-Treaps and to dynamic settings that realize the working-set bound. The work also provides robustness guarantees, showing prediction errors contribute additively through cross-entropy/KL-divergence measures, and demonstrates dynamic adaptability with changing priorities. Empirically, the proposed structures outperform traditional BSTs and prior learned-index approaches across Zipfian, adversarial, and uniform distributions, including scenarios with imperfect predictions, highlighting practical impact for adaptive indexing and external-memory data structures.
Abstract
We study learning-augmented binary search trees (BSTs) via Treaps with carefully designed priorities. The result is a simple search tree in which the depth of each item $x$ is determined by its predicted weight $w_x$. Specifically, each item $x$ is assigned a composite priority of $-\lfloor\log\log(1/w_x)\rfloor + U(0, 1)$ where $U(0, 1)$ is the uniform random variable. By choosing $w_x$ as the relative frequency of $x$, the resulting search trees achieve static optimality. This approach generalizes the recent learning-augmented BSTs [Lin-Luo-Woodruff ICML '22], which only work for Zipfian distributions, by extending them to arbitrary input distributions. Furthermore, we demonstrate that our method can be generalized to a B-Tree data structure using the B-Treap approach [Golovin ICALP '09]. Our search trees are also capable of leveraging localities in the access sequence through online self-reorganization, thereby achieving the working-set property. Additionally, they are robust to prediction errors and support dynamic operations, such as insertions, deletions, and prediction updates. We complement our analysis with an empirical study, demonstrating that our method outperforms prior work and classic data structures.
