Sliding Window Codes: Near-Optimality and Q-Learning for Zero-Delay Coding

Liam Cregg; Fady Alajaji; Serdar Yuksel

Sliding Window Codes: Near-Optimality and Q-Learning for Zero-Delay Coding

Liam Cregg, Fady Alajaji, Serdar Yuksel

TL;DR

This work tackles zero-delay coding of a Markov source over a noisy channel with feedback by recasting the problem as an MDP with a probability-belief state and a quantizer action. It introduces a practical sliding finite window belief MDP that yields near-optimal policies with explicit performance bounds, and an RL algorithm (Q-learning) that provably converges to these near-optimal policies under predictor stability. An alternative belief-quantization scheme is analyzed and compared, with convergence results under invariant start conditions; both approaches provide rigorous guarantees that the learned policies achieve distortion within $\epsilon$ of the optimum for sufficiently large window length or discretization level. Simulations corroborate the theory, showing near-optimal performance and favorable trade-offs against memoryless encoding and Lloyd–Max-type baselines, with implications for average-cost settings as $\beta\to1$.

Abstract

We study the problem of zero-delay coding for the transmission of a Markov source over a noisy channel with feedback and present a reinforcement learning solution which is guaranteed to achieve near-optimality. To this end, we formulate the problem as a Markov decision process (MDP) where the state is a probability-measure valued predictor/belief and the actions are quantizer maps. This MDP formulation has been used to show the optimality of certain classes of encoder policies in prior work, but their computation is prohibitively complex due to the uncountable nature of the constructed state space and the lack of minorization or strong ergodicity results. These challenges invite rigorous reinforcement learning methods, which entail several open questions: can we approximate this MDP with a finite-state one with some performance guarantee? Can we ensure convergence of a reinforcement learning algorithm for this approximate MDP? What regularity assumptions are required for the above to hold? We address these questions as follows: we present an approximation of the belief MDP using a sliding finite window of channel outputs and quantizers. Under an appropriate notion of predictor stability, we show that policies based on this finite window are near-optimal, in the sense that the lowest distortion achievable by such a policy approaches the true lowest distortion as the window length increases. We give sufficient conditions for predictor stability to hold. Finally, we propose a Q-learning algorithm which provably converges to a near-optimal policy and provide a detailed comparison of~the sliding finite window scheme with another approximation scheme which quantizes the belief MDP in a nearest neighbor fashion.

Sliding Window Codes: Near-Optimality and Q-Learning for Zero-Delay Coding

TL;DR

of the optimum for sufficiently large window length or discretization level. Simulations corroborate the theory, showing near-optimal performance and favorable trade-offs against memoryless encoding and Lloyd–Max-type baselines, with implications for average-cost settings as

Abstract

Paper Structure (23 sections, 19 theorems, 93 equations, 6 figures, 1 table)

This paper contains 23 sections, 19 theorems, 93 equations, 6 figures, 1 table.

Introduction
Preliminaries: Optimal Coding Problem and its MDP Formulation
Optimal Zero-Delay Coding and Existence of an Optimal Policy
Regularity Properties of the Markov Decision Process
Filter and Predictor Stability
A Note on MDP Notation
Sliding Finite Window Approximation of the Belief MDP
The Sliding Finite Window Belief MDP
Sliding Finite Window Approximation
Bounds on the Loss Term
Q-learning: Convergence to Near-Optimality
An Alternative Approximation Scheme and Comparison
Nearest Neighbor Approximation
Reinforcement Learning Theoretic Comparison of the Two Schemes
Implications for the Average Cost Problem
...and 8 more sections

Key Result

Proposition 2.5

wood2016optimal For any $\beta \in (0,1)$, there exists $\gamma^* \in \Gamma_{\text{WS}}$ that solves the discounted distortion problem (that is, it minimizes eq:dis_cost) for all priors $\pi_0 \in \mathcal{P}(\mathcal{X})$.

Figures (6)

Figure 1: Source-channel coding with feedback
Figure 2: Comparison with Lloyd-Max
Figure 3: Quantized belief scheme vs memoryless encoding
Figure 4: Finite memory scheme vs memoryless encoding
Figure 5: Quantized belief scheme with unknown optimum
...and 1 more figures

Theorems & Definitions (27)

Definition 2.1
Definition 2.2
Definition 2.3
Definition 2.4
Proposition 2.5
Proposition 2.6
Definition 2.7
Proposition 2.8
Lemma 2.9
Lemma 2.10
...and 17 more

Sliding Window Codes: Near-Optimality and Q-Learning for Zero-Delay Coding

TL;DR

Abstract

Sliding Window Codes: Near-Optimality and Q-Learning for Zero-Delay Coding

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (27)