Table of Contents
Fetching ...

Reinforcement Learning for Near-Optimal Design of Zero-Delay Codes for Markov Sources

Liam Cregg, Tamas Linder, Serdar Yuksel

TL;DR

This work tackles zero-delay lossy coding for finite-alphabet Markov sources by reframing encoder design as a Markov decision process over belief states $\pi_t \in \mathcal{P}(\mathbb{X})$ and solving it with a quantized Q-learning approach. The authors prove near-optimality for discounted distortion starting from the invariant distribution and extend the results to average-cost criteria by relating $J_\beta$ to $J$, supported by a rigorous analysis of unique ergodicity under a uniform exploration policy. They introduce Algorithm 1 (quantized Q-learning on the belief space) and prove its convergence under mild assumptions, providing a concrete, implementable method for near-optimal zero-delay codes. Simulations for finite and continuous sources demonstrate substantial gains over heuristic baselines, validating the practical viability of the approach and its potential applicability to real-time, delay-sensitive systems. The work thus bridges stochastic control and reinforcement learning to yield provably near-optimal, delay-free coding policies for memoryful sources, with extensions to noisy channels and finite-window architectures discussed as future directions.

Abstract

In the classical lossy source coding problem, one encodes long blocks of source symbols that enables the distortion to approach the ultimate Shannon limit. Such a block-coding approach introduces large delays, which is undesirable in many delay-sensitive applications. We consider the zero-delay case, where the goal is to encode and decode a finite-alphabet Markov source without any delay. It has been shown that this problem lends itself to stochastic control techniques, which lead to existence, structural, and general structural approximation results. However, these techniques so far have resulted only in computationally prohibitive algorithmic implementations for code design. To address this problem, we present a reinforcement learning design algorithm and rigorously prove its asymptotic optimality. In particular, we show that a quantized Q-learning algorithm can be used to obtain a near-optimal coding policy for this problem. The proof builds on recent results on quantized Q-learning for weakly Feller controlled Markov chains whose application necessitates the development of supporting technical results on regularity and stability properties, and relating the optimal solutions for discounted and average cost infinite horizon criteria problems. These theoretical results are supported by simulations.

Reinforcement Learning for Near-Optimal Design of Zero-Delay Codes for Markov Sources

TL;DR

This work tackles zero-delay lossy coding for finite-alphabet Markov sources by reframing encoder design as a Markov decision process over belief states and solving it with a quantized Q-learning approach. The authors prove near-optimality for discounted distortion starting from the invariant distribution and extend the results to average-cost criteria by relating to , supported by a rigorous analysis of unique ergodicity under a uniform exploration policy. They introduce Algorithm 1 (quantized Q-learning on the belief space) and prove its convergence under mild assumptions, providing a concrete, implementable method for near-optimal zero-delay codes. Simulations for finite and continuous sources demonstrate substantial gains over heuristic baselines, validating the practical viability of the approach and its potential applicability to real-time, delay-sensitive systems. The work thus bridges stochastic control and reinforcement learning to yield provably near-optimal, delay-free coding policies for memoryful sources, with extensions to noisy channels and finite-window architectures discussed as future directions.

Abstract

In the classical lossy source coding problem, one encodes long blocks of source symbols that enables the distortion to approach the ultimate Shannon limit. Such a block-coding approach introduces large delays, which is undesirable in many delay-sensitive applications. We consider the zero-delay case, where the goal is to encode and decode a finite-alphabet Markov source without any delay. It has been shown that this problem lends itself to stochastic control techniques, which lead to existence, structural, and general structural approximation results. However, these techniques so far have resulted only in computationally prohibitive algorithmic implementations for code design. To address this problem, we present a reinforcement learning design algorithm and rigorously prove its asymptotic optimality. In particular, we show that a quantized Q-learning algorithm can be used to obtain a near-optimal coding policy for this problem. The proof builds on recent results on quantized Q-learning for weakly Feller controlled Markov chains whose application necessitates the development of supporting technical results on regularity and stability properties, and relating the optimal solutions for discounted and average cost infinite horizon criteria problems. These theoretical results are supported by simulations.
Paper Structure (19 sections, 19 theorems, 65 equations, 4 figures)

This paper contains 19 sections, 19 theorems, 65 equations, 4 figures.

Key Result

Proposition 1

wood2016optimal There exists an optimal policy $\gamma^* \in \Gamma_{\text{WS}}$ for the average cost problem (eq:average_cost). That is, there exists $\gamma^*\in \Gamma_{\text{WS}}$ such that

Figures (4)

  • Figure 1: Block diagram for our zero-delay coding system: $X_t$ is the source sample, $q_t$ is the encoded symbol transmitted through the noiseless channel, and $\hat{X}_t$ is the reconstructed source sample.
  • Figure 2: Comparison of Algorithm 1 with O-FSSQ for finite Markov source
  • Figure 3: Comparison of Algorithm 1 with O-FSSQ for Gauss-Markov source
  • Figure 4: Comparison of Algorithm 1 with Lloyd-Max for i.i.d. Gaussian source

Theorems & Definitions (23)

  • Definition 1
  • Proposition 1
  • Proposition 2
  • Theorem 1: Discounted distortion
  • Theorem 2: Average distortion
  • Definition 2
  • Lemma 1
  • Theorem 3
  • Definition 3
  • Definition 4
  • ...and 13 more