Table of Contents
Fetching ...

Sliding Window Codes: Near-Optimality and Q-Learning for Zero-Delay Coding

Liam Cregg, Fady Alajaji, Serdar Yuksel

TL;DR

This work tackles zero-delay coding of a Markov source over a noisy channel with feedback by recasting the problem as an MDP with a probability-belief state and a quantizer action. It introduces a practical sliding finite window belief MDP that yields near-optimal policies with explicit performance bounds, and an RL algorithm (Q-learning) that provably converges to these near-optimal policies under predictor stability. An alternative belief-quantization scheme is analyzed and compared, with convergence results under invariant start conditions; both approaches provide rigorous guarantees that the learned policies achieve distortion within $\epsilon$ of the optimum for sufficiently large window length or discretization level. Simulations corroborate the theory, showing near-optimal performance and favorable trade-offs against memoryless encoding and Lloyd–Max-type baselines, with implications for average-cost settings as $\beta\to1$.

Abstract

We study the problem of zero-delay coding for the transmission of a Markov source over a noisy channel with feedback and present a reinforcement learning solution which is guaranteed to achieve near-optimality. To this end, we formulate the problem as a Markov decision process (MDP) where the state is a probability-measure valued predictor/belief and the actions are quantizer maps. This MDP formulation has been used to show the optimality of certain classes of encoder policies in prior work, but their computation is prohibitively complex due to the uncountable nature of the constructed state space and the lack of minorization or strong ergodicity results. These challenges invite rigorous reinforcement learning methods, which entail several open questions: can we approximate this MDP with a finite-state one with some performance guarantee? Can we ensure convergence of a reinforcement learning algorithm for this approximate MDP? What regularity assumptions are required for the above to hold? We address these questions as follows: we present an approximation of the belief MDP using a sliding finite window of channel outputs and quantizers. Under an appropriate notion of predictor stability, we show that policies based on this finite window are near-optimal, in the sense that the lowest distortion achievable by such a policy approaches the true lowest distortion as the window length increases. We give sufficient conditions for predictor stability to hold. Finally, we propose a Q-learning algorithm which provably converges to a near-optimal policy and provide a detailed comparison of~the sliding finite window scheme with another approximation scheme which quantizes the belief MDP in a nearest neighbor fashion.

Sliding Window Codes: Near-Optimality and Q-Learning for Zero-Delay Coding

TL;DR

This work tackles zero-delay coding of a Markov source over a noisy channel with feedback by recasting the problem as an MDP with a probability-belief state and a quantizer action. It introduces a practical sliding finite window belief MDP that yields near-optimal policies with explicit performance bounds, and an RL algorithm (Q-learning) that provably converges to these near-optimal policies under predictor stability. An alternative belief-quantization scheme is analyzed and compared, with convergence results under invariant start conditions; both approaches provide rigorous guarantees that the learned policies achieve distortion within of the optimum for sufficiently large window length or discretization level. Simulations corroborate the theory, showing near-optimal performance and favorable trade-offs against memoryless encoding and Lloyd–Max-type baselines, with implications for average-cost settings as .

Abstract

We study the problem of zero-delay coding for the transmission of a Markov source over a noisy channel with feedback and present a reinforcement learning solution which is guaranteed to achieve near-optimality. To this end, we formulate the problem as a Markov decision process (MDP) where the state is a probability-measure valued predictor/belief and the actions are quantizer maps. This MDP formulation has been used to show the optimality of certain classes of encoder policies in prior work, but their computation is prohibitively complex due to the uncountable nature of the constructed state space and the lack of minorization or strong ergodicity results. These challenges invite rigorous reinforcement learning methods, which entail several open questions: can we approximate this MDP with a finite-state one with some performance guarantee? Can we ensure convergence of a reinforcement learning algorithm for this approximate MDP? What regularity assumptions are required for the above to hold? We address these questions as follows: we present an approximation of the belief MDP using a sliding finite window of channel outputs and quantizers. Under an appropriate notion of predictor stability, we show that policies based on this finite window are near-optimal, in the sense that the lowest distortion achievable by such a policy approaches the true lowest distortion as the window length increases. We give sufficient conditions for predictor stability to hold. Finally, we propose a Q-learning algorithm which provably converges to a near-optimal policy and provide a detailed comparison of~the sliding finite window scheme with another approximation scheme which quantizes the belief MDP in a nearest neighbor fashion.
Paper Structure (23 sections, 19 theorems, 93 equations, 6 figures, 1 table)

This paper contains 23 sections, 19 theorems, 93 equations, 6 figures, 1 table.

Key Result

Proposition 2.5

wood2016optimal For any $\beta \in (0,1)$, there exists $\gamma^* \in \Gamma_{\text{WS}}$ that solves the discounted distortion problem (that is, it minimizes eq:dis_cost) for all priors $\pi_0 \in \mathcal{P}(\mathcal{X})$.

Figures (6)

  • Figure 1: Source-channel coding with feedback
  • Figure 2: Comparison with Lloyd-Max
  • Figure 3: Quantized belief scheme vs memoryless encoding
  • Figure 4: Finite memory scheme vs memoryless encoding
  • Figure 5: Quantized belief scheme with unknown optimum
  • ...and 1 more figures

Theorems & Definitions (27)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Definition 2.4
  • Proposition 2.5
  • Proposition 2.6
  • Definition 2.7
  • Proposition 2.8
  • Lemma 2.9
  • Lemma 2.10
  • ...and 17 more