Table of Contents
Fetching ...

Ergodic-Risk Constrained Policy Optimization: The Linear Quadratic Case

Shahriar Talebi, Na Li

TL;DR

This work addresses long-horizon risk in stochastic control with heavy-tailed noise by introducing ergodic-risk criteria that capture cumulative uncertainty through $C_\infty$ and its asymptotic variance $\gamma_N^2$. For LTI systems, it derives a quadratic ergodic-risk formulation and a tractable surrogate $\gamma_N^2(K)$, enabling a constrained LQR problem that minimizes the average cost $J(K)=\mathrm{tr}(Q_K \Sigma_K)$ subject to $\gamma_N^2(K)\le\bar{\beta}$. A primal-dual algorithm leveraging strong duality and a fast inner loop computes an optimal stabilizing policy $K^*(\lambda)$ with provable convergence rates. Numerical experiments on a Grumman X-29 model with heavy-tailed noise demonstrate a modest increase in average cost but enhanced resilience to large disturbances, highlighting the method’s practical value for risk-aware control in uncertain environments.

Abstract

Risk-sensitive control balances performance with resilience to unlikely events in uncertain systems. This paper introduces ergodic-risk criteria, which capture long-term cumulative risks through probabilistic limit theorems. By ensuring the dynamics exhibit strong ergodicity, we demonstrate that the time-correlated terms in these limiting criteria converge even with potentially heavy-tailed process noises as long as the noise has a finite fourth moment. Building upon this, we proposed the ergodic-risk constrained policy optimization which incorporates an ergodic-risk constraint to the classical Linear Quadratic Regulation (LQR) framework. We then propose a primal-dual policy optimization method that optimizes the average performance while satisfying the ergodic-risk constraints. Numerical results demonstrate that the new risk-constrained LQR not only optimizes average performance but also limits the asymptotic variance associated with the ergodic-risk criterion, making the closed-loop system more robust against sporadic large fluctuations in process noise.

Ergodic-Risk Constrained Policy Optimization: The Linear Quadratic Case

TL;DR

This work addresses long-horizon risk in stochastic control with heavy-tailed noise by introducing ergodic-risk criteria that capture cumulative uncertainty through and its asymptotic variance . For LTI systems, it derives a quadratic ergodic-risk formulation and a tractable surrogate , enabling a constrained LQR problem that minimizes the average cost subject to . A primal-dual algorithm leveraging strong duality and a fast inner loop computes an optimal stabilizing policy with provable convergence rates. Numerical experiments on a Grumman X-29 model with heavy-tailed noise demonstrate a modest increase in average cost but enhanced resilience to large disturbances, highlighting the method’s practical value for risk-aware control in uncertain environments.

Abstract

Risk-sensitive control balances performance with resilience to unlikely events in uncertain systems. This paper introduces ergodic-risk criteria, which capture long-term cumulative risks through probabilistic limit theorems. By ensuring the dynamics exhibit strong ergodicity, we demonstrate that the time-correlated terms in these limiting criteria converge even with potentially heavy-tailed process noises as long as the noise has a finite fourth moment. Building upon this, we proposed the ergodic-risk constrained policy optimization which incorporates an ergodic-risk constraint to the classical Linear Quadratic Regulation (LQR) framework. We then propose a primal-dual policy optimization method that optimizes the average performance while satisfying the ergodic-risk constraints. Numerical results demonstrate that the new risk-constrained LQR not only optimizes average performance but also limits the asymptotic variance associated with the ergodic-risk criterion, making the closed-loop system more robust against sporadic large fluctuations in process noise.

Paper Structure

This paper contains 11 sections, 3 theorems, 28 equations, 2 figures, 1 algorithm.

Key Result

Lemma 3

Under Assumptions assmp:noise and assmp:stability, for any stabilizing policy $K \in \mathcal{S}$, we have the following limits as $t\to\infty$: ${\mathbb{E}\,}[{X}_t] \to 0, \;{\mathbb{E}\,}[{\Lambda}_t/t] \to 0, \text{ and } {\mathbb{E}\,}[\Gamma_t/t] \to \Sigma_K,$ where $\Sigma_K$ is the unique Furthermore, we obtain that ${\Lambda}_t/t \xrightarrow{p} 0, \text{ as } t\to\infty,$ and $\{{X}_t

Figures (2)

  • Figure 1: The conditional expectation in blue is the orthogonal projection of $g({X}_{t},{U}_{t})$ onto $\mathcal{L}^2(\mathcal{F}_{t-1})$, i.e. its best estimate by the information up to time $t-1$, solving $\arg\min_{\hat{g} \in \mathcal{L}^2(\mathcal{F}_{t-1})} \sqrt{{\mathbb{E}\,}[(g - \hat{g})^2]}$. So, $C_t$ (in red) then retains the "uncertain component" of $g({X}_{t},{U}_{t})$.
  • Figure 2: Comparison of the optimal Ergodic-risk and optimal LQR policies for the Grumman X-29 aircraft under Student’s $t$-noise and simulated gust disturbances occurring every 500 time steps.

Theorems & Definitions (5)

  • Lemma 3
  • Theorem 4
  • Remark 5
  • proof
  • Lemma 7