Table of Contents
Fetching ...

Competitive Advantage of Huffman and Shannon-Fano Codes

Spencer Congero, Kenneth Zeger

TL;DR

This work analyzes the competitive landscape of lossless prefix codes by treating codeword length comparisons as a two-player game against an opponent. It proves that, for non-dyadic sources, Huffman codes strictly dominate Shannon-Fano codes in competitive terms, while the max possible competitive edge over Huffman codes is bounded by $\tfrac{1}{3}$ and can be approached arbitrarily closely for suitable sources with $n>3$. An asymptotic converse shows that randomly chosen non-dyadic sources almost surely do not admit competitively optimal Huffman codes as the source size grows. Moreover, the paper demonstrates that competitive advantages against SF can approach 1 with increasing source size, and provides detailed small-code classifications (notably for $n\le4$) and substantial experimental evidence supporting rapid convergence to non-optimality of Huffman codes in large ensembles. These results clarify the limits of optimality for common coding schemes and quantify potential gains from alternative prefix codes in non-dyadic regimes.

Abstract

For any finite discrete source, the competitive advantage of prefix code $C_1$ over prefix code $C_2$ is the probability $C_1$ produces a shorter codeword than $C_2$, minus the probability $C_2$ produces a shorter codeword than $C_1$. For any source, a prefix code is competitively optimal if it has a nonnegative competitive advantage over all other prefix codes. In 1991, Cover proved that Huffman codes are competitively optimal for all dyadic sources, namely sources whose symbol probabilities are negative integer powers of $2$. We prove the following asymptotic converse: As the source size grows, the probability a Huffman code for a randomly chosen non-dyadic source is competitively optimal converges to zero. We also prove: (i) For any non-dyadic source, a Huffman code has a positive competitive advantage over a Shannon-Fano code; (ii) For any source, the competitive advantage of any prefix code over a Huffman code is strictly less than $\frac{1}{3}$; (iii) For each integer $n>3$, there exists a source of size $n$ and some prefix code whose competitive advantage over a Huffman code is arbitrarily close to $\frac{1}{3}$; and (iv) For each positive integer $n$, there exists a source of size $n$ and some prefix code whose competitive advantage over a Shannon-Fano code becomes arbitrarily close to $1$ as $n\to\infty$.

Competitive Advantage of Huffman and Shannon-Fano Codes

TL;DR

This work analyzes the competitive landscape of lossless prefix codes by treating codeword length comparisons as a two-player game against an opponent. It proves that, for non-dyadic sources, Huffman codes strictly dominate Shannon-Fano codes in competitive terms, while the max possible competitive edge over Huffman codes is bounded by and can be approached arbitrarily closely for suitable sources with . An asymptotic converse shows that randomly chosen non-dyadic sources almost surely do not admit competitively optimal Huffman codes as the source size grows. Moreover, the paper demonstrates that competitive advantages against SF can approach 1 with increasing source size, and provides detailed small-code classifications (notably for ) and substantial experimental evidence supporting rapid convergence to non-optimality of Huffman codes in large ensembles. These results clarify the limits of optimality for common coding schemes and quantify potential gains from alternative prefix codes in non-dyadic regimes.

Abstract

For any finite discrete source, the competitive advantage of prefix code over prefix code is the probability produces a shorter codeword than , minus the probability produces a shorter codeword than . For any source, a prefix code is competitively optimal if it has a nonnegative competitive advantage over all other prefix codes. In 1991, Cover proved that Huffman codes are competitively optimal for all dyadic sources, namely sources whose symbol probabilities are negative integer powers of . We prove the following asymptotic converse: As the source size grows, the probability a Huffman code for a randomly chosen non-dyadic source is competitively optimal converges to zero. We also prove: (i) For any non-dyadic source, a Huffman code has a positive competitive advantage over a Shannon-Fano code; (ii) For any source, the competitive advantage of any prefix code over a Huffman code is strictly less than ; (iii) For each integer , there exists a source of size and some prefix code whose competitive advantage over a Huffman code is arbitrarily close to ; and (iv) For each positive integer , there exists a source of size and some prefix code whose competitive advantage over a Shannon-Fano code becomes arbitrarily close to as .
Paper Structure (14 sections, 32 theorems, 65 equations, 4 figures)

This paper contains 14 sections, 32 theorems, 65 equations, 4 figures.

Key Result

Lemma 1.1

For any source, if a prefix code is expected length optimal, then it is monotone.

Figures (4)

  • Figure 1: A code tree for a prefix code of a source of size $3$.
  • Figure 2: Code trees of four prefix codes for a source of size $6$.
  • Figure 3: Two Huffman trees and an optimal third code tree for a single source.
  • Figure 4: Lower bound on the fraction of $10^6$ randomly chosen sources whose Huffman code is not competitively optimal, as a function of the source size $n$. For $n=15$ Huffman codewords, about $99\%$ of randomly selected sources did not have competitively optimal Huffman codes. For $n \ge 31$, all $10^6$ randomly chosen sources had Huffman codes that were not competitively optimal.

Theorems & Definitions (55)

  • Lemma 1.1: e.g., Gallager-IT-1978
  • Lemma 1.2: Kraft, e.g., Cover-Thomas-book-2006
  • Lemma 1.3
  • Theorem 1.4: Cover Cover-1991
  • Lemma 1.5: Yamamoto and Itoh Yamamoto-Itoh-1995
  • Lemma 1.6: Manickman Manickman-2019
  • Corollary 1.7
  • Example 1.8: Two Huffman codes
  • Example 1.9: Two Huffman codes and two other codes
  • Lemma 2.1: e.g., Devroye-book
  • ...and 45 more