Worst-case optimal adaptive alphabetic prefix-free coding
Travis Gagie
TL;DR
This work tackles one-pass adaptive alphabetic prefix-free coding by introducing a block-based GM59-inspired scheme that preserves lexicographic order while achieving worst-case optimal time and compression for $\sigma$ up to $o(n^{1/2}/\log n)$. The method builds a sequence of alphabetic codes over blocks, using a distribution that blends the observed symbol frequencies with a uniform baseline, and supports constant-time per-character encoding/decoding via precomputed lookup tables. It proves a tight overall bit bound of $nH + O(n)$ with a refined $n(H+2+o(1)) + O((\sigma \log \max(n,\sigma))^2)$ and $O(n+\sigma \log n)$ total time, giving near-entropy-optimal performance in the stated regime and practical constant-time operations for larger sigma up to $O(n/\log n)$. The results extend the landscape of adaptive alphabetic coding by achieving worst-case optimality in both time and compression under a broad, sublinear-sigma regime, albeit within a theoretical, non-fully-practical framework at present.
Abstract
We give the first algorithm for adaptive alphabetic prefix-free coding that is worst-case optimal in terms of time and compression when $σ\in o \left( \frac{n^{1 / 2}}{\log n} \right)$, where $σ$ is the size of the alphabet and $n$ is the length of the input.
