Table of Contents
Fetching ...

Linear Algebraic Truncation Algorithm with A Posteriori Error Bounds for Computing Markov Chain Equilibrium Gradients

Saied Mahdian, Peter W. Glynn

TL;DR

The paper tackles computing equilibrium reward gradients \nabla_\\theta \\alpha(\\theta) for Markov chains with large or infinite state spaces by introducing a truncation-based method with computable a posteriori error bounds. It leverages regeneration at a fixed return state $z$ to express the gradient as a ratio of regenerative sums, and develops Lyapunov-function–driven bounds to control path excursions outside the truncation set. A general theory for non-negative linear systems provides finite-dimensional, verifiable error bounds, which are then specialized to bound $w(\\theta,z,f)$ and its derivative, enabling explicit bounds on the gradient and its derivative. The framework extends to Markov jump processes and is demonstrated numerically on a G/M/1 queue and a two-station Jackson network, showing tight gradient bounds with moderate truncations and practical applicability for sensitivity analysis and optimization in stochastic networks.

Abstract

The numerical computation of equilibrium reward gradients for Markov chains appears in many applications for example within the policy improvement step arising in connection with average reward stochastic dynamic programming. When the state space is large or infinite, one will typically need to truncate the state space in order to arrive at a numerically tractable formulation. In this paper, we derive the first computable a posteriori error bounds for equilibrium reward gradients that account for the error induced by the truncation. Our approach uses regeneration to express equilibrium quantities in terms of the expectations of cumulative rewards over regenerative cycles. Lyapunov functions are then used to bound the contributions to these cumulative rewards and their gradients from path excursions that take the chain outside the truncation set. Our numerical results indicate that our approach can provide highly accurate bounds with truncation sets of moderate size. We further extend our approach to Markov jump processes.

Linear Algebraic Truncation Algorithm with A Posteriori Error Bounds for Computing Markov Chain Equilibrium Gradients

TL;DR

The paper tackles computing equilibrium reward gradients \nabla_\\theta \\alpha(\\theta) for Markov chains with large or infinite state spaces by introducing a truncation-based method with computable a posteriori error bounds. It leverages regeneration at a fixed return state to express the gradient as a ratio of regenerative sums, and develops Lyapunov-function–driven bounds to control path excursions outside the truncation set. A general theory for non-negative linear systems provides finite-dimensional, verifiable error bounds, which are then specialized to bound and its derivative, enabling explicit bounds on the gradient and its derivative. The framework extends to Markov jump processes and is demonstrated numerically on a G/M/1 queue and a two-station Jackson network, showing tight gradient bounds with moderate truncations and practical applicability for sensitivity analysis and optimization in stochastic networks.

Abstract

The numerical computation of equilibrium reward gradients for Markov chains appears in many applications for example within the policy improvement step arising in connection with average reward stochastic dynamic programming. When the state space is large or infinite, one will typically need to truncate the state space in order to arrive at a numerically tractable formulation. In this paper, we derive the first computable a posteriori error bounds for equilibrium reward gradients that account for the error induced by the truncation. Our approach uses regeneration to express equilibrium quantities in terms of the expectations of cumulative rewards over regenerative cycles. Lyapunov functions are then used to bound the contributions to these cumulative rewards and their gradients from path excursions that take the chain outside the truncation set. Our numerical results indicate that our approach can provide highly accurate bounds with truncation sets of moderate size. We further extend our approach to Markov jump processes.
Paper Structure (9 sections, 4 theorems, 110 equations, 2 figures)

This paper contains 9 sections, 4 theorems, 110 equations, 2 figures.

Key Result

Proposition 1

Figures (2)

  • Figure 1: G/M/1 queue: Numerical accuracy of our gradient bounds method versus $n$
  • Figure 2: Network of queues: Numerical accuracy of our gradient bounds versus $n$.

Theorems & Definitions (7)

  • Proposition 1
  • proof
  • Remark 1
  • Theorem 1
  • proof
  • Proposition 2
  • Theorem 2