Finite Block Length Rate-Distortion Theory for the Bernoulli Source with Hamming Distortion: A Tutorial

Bhaskar Krishnamachari

Finite Block Length Rate-Distortion Theory for the Bernoulli Source with Hamming Distortion: A Tutorial

Bhaskar Krishnamachari

TL;DR

This work derives the classical rate-distortion function $RD = Hp - HD$ from first principles, illustrates its computation via the Blahut-Arimoto algorithm, and develops the finite block length refinements that characterize how the minimum achievable rate approaches the Shannon limit as the block length grows.

Abstract

Lossy data compression lies at the heart of modern communication and storage systems. Shannon's rate-distortion theory provides the fundamental limit on how much a source can be compressed at a given fidelity, but it assumes infinitely long block lengths that are never realized in practice. We present a self-contained tutorial on rate-distortion theory for the simplest non-trivial source: a Bernoulli$(p)$ sequence with Hamming distortion. We derive the classical rate-distortion function $RD = Hp - HD$ from first principles, illustrate its computation via the Blahut-Arimoto algorithm, and then develop the finite block length refinements that characterize how the minimum achievable rate approaches the Shannon limit as the block length $n$ grows. The central quantity in this refinement is the \emph{rate-distortion dispersion} $V(D)$, which governs the $O(1/\sqrt{n})$ penalty for operating at finite block lengths. We accompany all theoretical developments with numerical examples and figures generated by accompanying Python scripts.

Finite Block Length Rate-Distortion Theory for the Bernoulli Source with Hamming Distortion: A Tutorial

TL;DR

This work derives the classical rate-distortion function

from first principles, illustrates its computation via the Blahut-Arimoto algorithm, and develops the finite block length refinements that characterize how the minimum achievable rate approaches the Shannon limit as the block length grows.

Abstract

sequence with Hamming distortion. We derive the classical rate-distortion function

from first principles, illustrate its computation via the Blahut-Arimoto algorithm, and then develop the finite block length refinements that characterize how the minimum achievable rate approaches the Shannon limit as the block length

grows. The central quantity in this refinement is the \emph{rate-distortion dispersion}

, which governs the

penalty for operating at finite block lengths. We accompany all theoretical developments with numerical examples and figures generated by accompanying Python scripts.

Paper Structure (34 sections, 1 theorem, 62 equations, 12 figures, 1 algorithm)

This paper contains 34 sections, 1 theorem, 62 equations, 12 figures, 1 algorithm.

Introduction
Probability and Information Foundations
Random Variables and Probability
Entropy
Sequences and Typical Sequences
Mutual Information
The Rate-Distortion Problem
What Is Lossy Compression?
Distortion Measures
The Fundamental Question
The Test Channel and the Rate-Distortion Function
The Rate-Distortion Function for the Bernoulli Source
Setting Up the Optimization
The Optimal Test Channel
The Closed-Form Result
...and 19 more sections

Key Result

Theorem 6.1

For a discrete memoryless source with rate-distortion function $R(D)$ and dispersion $V(D) > 0$, the minimum rate at block length $n$ and excess-distortion probability $\varepsilon \in (0, 1)$ satisfies where $Q^{-1}(\varepsilon)$ is the inverse of the Gaussian $Q$-function, $Q(x) = \frac{1}{\sqrt{2\pi}} \int_x^{\infty} e^{-t^2/2}\, dt$.

Figures (12)

Figure 1: The binary entropy function $H(p)$ versus the source bias $p$. The entropy is maximized at $p = 1/2$, where each bit carries one full bit of information, and vanishes at $p \in \{0, 1\}$, where the source is deterministic.
Figure 2: Top: the operational lossy compression setup with encoder and decoder. Bottom: the test channel $p_{\hat{X}|X}$ that abstracts away the codebook structure. The rate-distortion function minimizes mutual information $I(X;\hat{X})$ over all test channels satisfying the distortion constraint.
Figure 3: The rate-distortion function $R(D) = H(p) - H(D)$ for a $\mathrm{Bernoulli}(p)$ source with Hamming distortion, shown for $p \in \{0.11, 0.2, 0.3, 0.5\}$. Each curve is convex and decreasing, starting at $R(0) = H(p)$ and reaching zero at $D = \min(p, 1-p)$. The $p = 0.5$ curve starts highest because the fair coin has the most entropy.
Figure 4: Convergence of the Blahut-Arimoto algorithm for $p = 0.3$ and slope parameters $s \in \{2, 5, 10, 20\}$. The rate converges monotonically to its final value within a few tens of iterations.
Figure 5: Comparison of the Blahut-Arimoto computed rate-distortion points (circles) with the closed-form curve $R(D) = H(p) - H(D)$ (solid line) for $p = 0.3$. The agreement is exact to numerical precision.
...and 7 more figures

Theorems & Definitions (13)

Definition 2.1: Binary Entropy
Definition 2.2: Mutual Information
Definition 3.1: Rate-Distortion Function
Remark 4.1
Remark 4.2
Remark 5.1
Remark 5.2: Intuition for Step 1
Definition 6.1: $(n, M, D, \varepsilon)$ Code
Definition 6.2: $d$-Tilted Information kostina2012
proof
...and 3 more

Finite Block Length Rate-Distortion Theory for the Bernoulli Source with Hamming Distortion: A Tutorial

TL;DR

Abstract

Finite Block Length Rate-Distortion Theory for the Bernoulli Source with Hamming Distortion: A Tutorial

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (13)