Layered List Labeling
Michael A. Bender, Alex Conway, Martin Farach-Colton, Hanna Komlos, William Kuszmaul
TL;DR
This paper resolves a long-standing tension in list-labeling by presenting a black-box composition framework that combines three distinct list-labeling guarantees into a single algorithm with favorable worst-case, adaptive, and expected bounds. The core construction, the embedding $F \triangleleft R$, pairs a fast emulator with a reliable shell through a hierarchical buffer mechanism, using an $F$-emulator and an $R$-shell with checkpointed rebuilds to preserve sortedness while bounding costs via lightly-amortized guarantees. The authors prove two main results, Theorems twocomp and threecomp, showing that sequential and nested compositions preserve the desired properties, and they extend the framework to yield concrete consequences such as improved bounds for hammer-insert workloads and learning-augmented list-labeling with predictions. The work provides a general, reusable technique for reconciling worst-case latency, adaptive behavior, and high throughput in dynamic data structures, with potential practical impact on databases and order-maintenance systems relying on packed-memory arrays. $O(\log^2 n)$ remains a reference baseline, but the framework demonstrates that one can achieve the best of three worlds by compositional design rather than by pushing a single method to extremes.
Abstract
The list-labeling problem is one of the most basic and well-studied algorithmic primitives in data structures, with an extensive literature spanning upper bounds, lower bounds, and data management applications. The classical algorithm for this problem, dating back to 1981, has amortized cost $O(\log^2 n)$. Subsequent work has led to improvements in three directions: \emph{low-latency} (worst-case) bounds; \emph{high-throughput} (expected) bounds; and (adaptive) bounds for \emph{important workloads}. Perhaps surprisingly, these three directions of research have remained almost entirely disjoint -- this is because, so far, the techniques that allow for progress in one direction have forced worsening bounds in the others. Thus there would appear to be a tension between worst-case, adaptive, and expected bounds. List labeling has been proposed for use in databases at least as early as PODS'99, but a database needs good throughput, response time, and needs to adapt to common workloads (e.g., bulk loads), and no current list-labeling algorithm achieve good bounds for all three. We show that this tension is not fundamental. In fact, with the help of new data-structural techniques, one can actually \emph{combine} any three list-labeling solutions in order to cherry-pick the best worst-case, adaptive, and expected bounds from each of them.
