Finite Block Length Rate-Distortion Theory for the Bernoulli Source with Hamming Distortion: A Tutorial
Bhaskar Krishnamachari
TL;DR
This work derives the classical rate-distortion function $RD = Hp - HD$ from first principles, illustrates its computation via the Blahut-Arimoto algorithm, and develops the finite block length refinements that characterize how the minimum achievable rate approaches the Shannon limit as the block length grows.
Abstract
Lossy data compression lies at the heart of modern communication and storage systems. Shannon's rate-distortion theory provides the fundamental limit on how much a source can be compressed at a given fidelity, but it assumes infinitely long block lengths that are never realized in practice. We present a self-contained tutorial on rate-distortion theory for the simplest non-trivial source: a Bernoulli$(p)$ sequence with Hamming distortion. We derive the classical rate-distortion function $RD = Hp - HD$ from first principles, illustrate its computation via the Blahut-Arimoto algorithm, and then develop the finite block length refinements that characterize how the minimum achievable rate approaches the Shannon limit as the block length $n$ grows. The central quantity in this refinement is the \emph{rate-distortion dispersion} $V(D)$, which governs the $O(1/\sqrt{n})$ penalty for operating at finite block lengths. We accompany all theoretical developments with numerical examples and figures generated by accompanying Python scripts.
