Nearly Optimal List Labeling
Michael A. Bender, Alex Conway, Martín Farach-Colton, Hanna Komlós, Michal Koucký, William Kuszmaul, Michael Saks
TL;DR
The paper tackles the dynamic list-labeling problem: maintaining a sorted set of up to $n$ elements in an array of size $m=(1+Θ(1))n$ under online insertions/deletions with minimal moves. It introduces the See-Saw Algorithm, a randomized, history-dependent data-structure that partitions the array into a recursive subproblem tree, uses random rebuild windows, and adaptively allocates slots to subproblems based on past insertion patterns. A central contribution is a near-optimal amortized bound of $O\left(\log n\,\operatorname{polyloglog} n\right)$ per operation, matching the known lower bound up to polyloglog factors, and achieved by integrating random window sizes with adaptive array skews through See-Saw Lemma-based analysis. The work also establishes a suite of reductions and detailed probabilistic analyses to bound rebuild costs, resets, and the likelihood of expensive leaves, showing that the See-Saw approach attains near-worst-case optimal performance for this classic problem and has potential implications for cache-oblivious structures and related dynamic data-structure problems.
Abstract
The list-labeling problem captures the basic task of storing a dynamically changing set of up to $n$ elements in sorted order in an array of size $m = (1 + Θ(1))n$. The goal is to support insertions and deletions while moving around elements within the array as little as possible. Until recently, the best known upper bound stood at $O(\log^2 n)$ amortized cost. This bound, which was first established in 1981, was finally improved two years ago, when a randomized $O(\log^{3/2} n)$ expected-cost algorithm was discovered. The best randomized lower bound for this problem remains $Ω(\log n)$, and closing this gap is considered to be a major open problem in data structures. In this paper, we present the See-Saw Algorithm, a randomized list-labeling solution that achieves a nearly optimal bound of $O(\log n \operatorname{polyloglog} n)$ amortized expected cost. This bound is achieved despite at least three lower bounds showing that this type of result is impossible for large classes of solutions.
