Ergodic-risk Criterion for Stochastically Stabilizing Policy Optimization
Shahriar Talebi, Na Li
TL;DR
The paper develops ergodic-risk criteria to quantify long-term cumulative risk in controlled Markov chains, addressing heavy-tailed and nonstationary settings by leveraging uniform ergodicity and tailored Functional CLTs. It establishes existence and convergence results for the ergodic-risk metrics (C_∞, γ_C^2, γ_N^2) under affine stabilizing policies and derives a quadratic ergodic-risk COCP. A primal-dual algorithm is proposed with strong duality, enabling risk-constrained policy optimization that preserves average performance while limiting long-run risk, and is validated by simulations on realistic dynamical systems. This framework extends risk-sensitive control to general-state, nonstationary processes, providing theoretical guarantees and a practical optimization strategy for heavy-tailed disturbances. The results pave the way for data-driven implementations and direct handling of long-term risk in complex stochastic control problems.
Abstract
This paper introduces ergodic-risk criteria, which capture long-term cumulative risks associated with controlled Markov chains through probabilistic limit theorems--in contrast to existing methods that require assumptions of either finite hitting time, finite state/action space, or exponentiation necessitating light-tailed distributions. Using tailored Functional Central Limit Theorems (FCLT), we demonstrate that the time-correlated terms in the ergodic-risk criteria converge under uniform ergodicity and establish conditions for the convergence of these criteria in non-stationary general-state Markov chains involving heavy-tailed distributions. For quadratic risk functionals on stochastic linear systems, in addition to internal stability, this requires the (possibly heavy-tailed) process noise to have only a finite fourth moment. After quantifying cumulative uncertainties in risk functionals that account for extreme deviations, these ergodic-risk criteria are then incorporated into policy optimizations, thereby extending the standard average optimal synthesis to a risk-sensitive framework. Finally, by establishing the strong duality of the constrained policy optimization, we propose a primal-dual algorithm that optimizes average performance while ensuring that certain risks associated with these ergodic-risk criteria are constrained. Our risk-sensitive framework offers a theoretically guaranteed policy iteration for the long-term risk-sensitive control of processes involving heavy-tailed noise, which is shown to be effective through several simulations.
