Competitive Advantage of Huffman and Shannon-Fano Codes
Spencer Congero, Kenneth Zeger
TL;DR
This work analyzes the competitive landscape of lossless prefix codes by treating codeword length comparisons as a two-player game against an opponent. It proves that, for non-dyadic sources, Huffman codes strictly dominate Shannon-Fano codes in competitive terms, while the max possible competitive edge over Huffman codes is bounded by $\tfrac{1}{3}$ and can be approached arbitrarily closely for suitable sources with $n>3$. An asymptotic converse shows that randomly chosen non-dyadic sources almost surely do not admit competitively optimal Huffman codes as the source size grows. Moreover, the paper demonstrates that competitive advantages against SF can approach 1 with increasing source size, and provides detailed small-code classifications (notably for $n\le4$) and substantial experimental evidence supporting rapid convergence to non-optimality of Huffman codes in large ensembles. These results clarify the limits of optimality for common coding schemes and quantify potential gains from alternative prefix codes in non-dyadic regimes.
Abstract
For any finite discrete source, the competitive advantage of prefix code $C_1$ over prefix code $C_2$ is the probability $C_1$ produces a shorter codeword than $C_2$, minus the probability $C_2$ produces a shorter codeword than $C_1$. For any source, a prefix code is competitively optimal if it has a nonnegative competitive advantage over all other prefix codes. In 1991, Cover proved that Huffman codes are competitively optimal for all dyadic sources, namely sources whose symbol probabilities are negative integer powers of $2$. We prove the following asymptotic converse: As the source size grows, the probability a Huffman code for a randomly chosen non-dyadic source is competitively optimal converges to zero. We also prove: (i) For any non-dyadic source, a Huffman code has a positive competitive advantage over a Shannon-Fano code; (ii) For any source, the competitive advantage of any prefix code over a Huffman code is strictly less than $\frac{1}{3}$; (iii) For each integer $n>3$, there exists a source of size $n$ and some prefix code whose competitive advantage over a Huffman code is arbitrarily close to $\frac{1}{3}$; and (iv) For each positive integer $n$, there exists a source of size $n$ and some prefix code whose competitive advantage over a Shannon-Fano code becomes arbitrarily close to $1$ as $n\to\infty$.
