Ergodic-risk Criterion for Stochastically Stabilizing Policy Optimization

Shahriar Talebi; Na Li

Ergodic-risk Criterion for Stochastically Stabilizing Policy Optimization

Shahriar Talebi, Na Li

TL;DR

The paper develops ergodic-risk criteria to quantify long-term cumulative risk in controlled Markov chains, addressing heavy-tailed and nonstationary settings by leveraging uniform ergodicity and tailored Functional CLTs. It establishes existence and convergence results for the ergodic-risk metrics (C_∞, γ_C^2, γ_N^2) under affine stabilizing policies and derives a quadratic ergodic-risk COCP. A primal-dual algorithm is proposed with strong duality, enabling risk-constrained policy optimization that preserves average performance while limiting long-run risk, and is validated by simulations on realistic dynamical systems. This framework extends risk-sensitive control to general-state, nonstationary processes, providing theoretical guarantees and a practical optimization strategy for heavy-tailed disturbances. The results pave the way for data-driven implementations and direct handling of long-term risk in complex stochastic control problems.

Abstract

This paper introduces ergodic-risk criteria, which capture long-term cumulative risks associated with controlled Markov chains through probabilistic limit theorems--in contrast to existing methods that require assumptions of either finite hitting time, finite state/action space, or exponentiation necessitating light-tailed distributions. Using tailored Functional Central Limit Theorems (FCLT), we demonstrate that the time-correlated terms in the ergodic-risk criteria converge under uniform ergodicity and establish conditions for the convergence of these criteria in non-stationary general-state Markov chains involving heavy-tailed distributions. For quadratic risk functionals on stochastic linear systems, in addition to internal stability, this requires the (possibly heavy-tailed) process noise to have only a finite fourth moment. After quantifying cumulative uncertainties in risk functionals that account for extreme deviations, these ergodic-risk criteria are then incorporated into policy optimizations, thereby extending the standard average optimal synthesis to a risk-sensitive framework. Finally, by establishing the strong duality of the constrained policy optimization, we propose a primal-dual algorithm that optimizes average performance while ensuring that certain risks associated with these ergodic-risk criteria are constrained. Our risk-sensitive framework offers a theoretically guaranteed policy iteration for the long-term risk-sensitive control of processes involving heavy-tailed noise, which is shown to be effective through several simulations.

Ergodic-risk Criterion for Stochastically Stabilizing Policy Optimization

TL;DR

Abstract

Paper Structure (17 sections, 8 theorems, 78 equations, 5 figures, 1 algorithm)

This paper contains 17 sections, 8 theorems, 78 equations, 5 figures, 1 algorithm.

Introduction
Problem Setup
Ergodic-risk Constrained Optimal Control Problem
Existence of the Ergodic-risk Criteria
Setting up the Existence Analysis
$V$-uniform Ergodicity of LTI Systems
The tailored Functional LLN and CLT
Proofs of \ref{['thm:V-ergodic', 'thm:tailored-clt']}, and the resulting \ref{['thm:C-infty-N-convergence']}
Policy Optimization for Quadratic Ergodic-risk COCP
Constrained Policy optimization
Quadratic Ergodic-risk Criteria with $R^c=0$ and $\ell = 0$
Strong Duality
The Algorithm
Convergence Guarantee
Simulations
...and 2 more sections

Key Result

Theorem 3.3

\newlabelthm:C-infty-N-convergence0 Suppose Assumptions assmp:stability and assmp:noise hold and consider the chain $\mathbf{\Phi}^\pi$ for any policy $(K,\ell) \in \mathcal{S}\times\mathbb{R}^n$ that is stabilizing and $(A_K,H)$ is controllable. Consider the risk functional $g$ in eq:def-quadrati

Figures (5)

Figure 1: The conditional expectation $\mathop{\mathrm{\mathrm{\mathbb{E}}}}\nolimits[g({X}_{t},{U}_{t}) | \mathcal{F}_{t-1}]$ is the orthogonal projection (in blue) of the risk functional $g({X}_{t},{U}_{t})$ onto $\mathcal{L}^2(\mathcal{F}_{t-1})$, i.e. its best estimate by the information up to time $t-1$, i.e. the solution to $\arg\min_{\hat{g} \in \mathcal{L}^2(\mathcal{F}_{t-1})}\|g - \hat{g}\|_{\mathcal{L}^2(\mathcal{F}_{t-1})} = \sqrt{\mathop{\mathrm{\mathrm{\mathbb{E}}}}\nolimits[(g - \hat{g})^2]}.$ The random variable $C_t$ (in red) then retains the "uncertain part" of the risk functional at time $t$.
Figure 1: Convergence of \ref{['algo']} to the optimal ergodic-risk policy for Grumman X-29 aircraft dynamics.
Figure 2: The optimal ergodic-risk versus the optimal LQR policies for Grumman X-29 aircraft dynamics under simulated gust disturbances at every 200 time steps.
Figure 3: The average of running covariance $S_t^2/t$ over 10,000 system responses under LQR versus the ergodic-risk optimal policies, approximating their corresponding asymptotic variance $\gamma_C^2$ as $t$ approaches infinity.
Figure 4: Convergence of \ref{['algo']} on 50 randomly sampled problem instances in terms of errors in KKT conditions vs iteration $m$.

Theorems & Definitions (20)

Remark 2.1
Theorem 3.3
Remark 3.4
Lemma 3.5
Lemma 3.6
Definition 3.7
Definition 3.8
Theorem 3.9
Theorem 3.10
Proof 1: Proof of \ref{['thm:V-ergodic']}
...and 10 more

Ergodic-risk Criterion for Stochastically Stabilizing Policy Optimization

TL;DR

Abstract

Ergodic-risk Criterion for Stochastically Stabilizing Policy Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (20)