Table of Contents
Fetching ...

Primal-Only Actor Critic Algorithm for Robust Constrained Average Cost MDPs

Anirudh Satheesh, Sooraj Sathish, Swetha Ganesh, Keenan Powell, Vaneet Aggarwal

TL;DR

This work proposes an actor-critic algorithm for Average-Cost RCMDPs that achieves both \(\epsilon\)-feasibility and \(\epsilon\)-optimality, and establishes a sample complexities of \(\tilde{O}\left(\epsilon^{-4}\right)\) and \(\tilde{O}\left(\epsilon^{-6}\right)\) with and without slackness assumption, which is comparable to the discounted setting.

Abstract

In this work, we study the problem of finding robust and safe policies in Robust Constrained Average-Cost Markov Decision Processes (RCMDPs). A key challenge in this setting is the lack of strong duality, which prevents the direct use of standard primal-dual methods for constrained RL. Additional difficulties arise from the average-cost setting, where the Robust Bellman operator is not a contraction under any norm. To address these challenges, we propose an actor-critic algorithm for Average-Cost RCMDPs. We show that our method achieves both \(ε\)-feasibility and \(ε\)-optimality, and we establish a sample complexities of \(\tilde{O}\left(ε^{-4}\right)\) and \(\tilde{O}\left(ε^{-6}\right)\) with and without slackness assumption, which is comparable to the discounted setting.

Primal-Only Actor Critic Algorithm for Robust Constrained Average Cost MDPs

TL;DR

This work proposes an actor-critic algorithm for Average-Cost RCMDPs that achieves both -feasibility and -optimality, and establishes a sample complexities of \(\tilde{O}\left(\epsilon^{-4}\right)\) and \(\tilde{O}\left(\epsilon^{-6}\right)\) with and without slackness assumption, which is comparable to the discounted setting.

Abstract

In this work, we study the problem of finding robust and safe policies in Robust Constrained Average-Cost Markov Decision Processes (RCMDPs). A key challenge in this setting is the lack of strong duality, which prevents the direct use of standard primal-dual methods for constrained RL. Additional difficulties arise from the average-cost setting, where the Robust Bellman operator is not a contraction under any norm. To address these challenges, we propose an actor-critic algorithm for Average-Cost RCMDPs. We show that our method achieves both -feasibility and -optimality, and we establish a sample complexities of \(\tilde{O}\left(ε^{-4}\right)\) and \(\tilde{O}\left(ε^{-6}\right)\) with and without slackness assumption, which is comparable to the discounted setting.

Paper Structure

This paper contains 23 sections, 12 theorems, 43 equations, 5 figures, 1 table, 3 algorithms.

Key Result

Theorem 1

If $(g, V)$ is a solution to the robust Bellman equation where $\sigma_{\mathcal{P}^a_s} = \min_{P \in \mathcal{P}^a_s}$ is denoted as the support function, then the scalar $g$ corresponds to the robust average cost, i.e., $g = g_\mathcal{P}^\pi$, and the worst-case transition kernel $P_V$ belongs to the set of minimizing transition kernels, i.e., $P_V \i

Figures (5)

  • Figure 1: Performance of the Robust Constrained Average-Cost Actor-Critic algorithm under the Contamination uncertainty set.
  • Figure 2: Performance of the Robust Constrained Average-Cost Actor-Critic algorithm under the Total Variation (TV) uncertainty set.
  • Figure 3: Performance of the Robust Constrained Average-Cost Actor-Critic algorithm under the Wasserstein uncertainty set.
  • Figure 4: Performance of the Robust Constrained Average-Cost Actor-Critic algorithm under the TV uncertainty set with larger uncertainty set radii.
  • Figure 5: Performance of the Robust Constrained Average-Cost Actor-Critic algorithm under the Wasserstein uncertainty set with larger uncertainty set radii.

Theorems & Definitions (13)

  • Theorem 1: Robust Bellman Equation, Theorem 3.1 in wang2023model
  • Theorem 2: Robust Bellman Operator JMLR:v25:23-0526
  • Definition 3: Definition 3.1 in NEURIPS2024_1f28e934
  • Lemma 4: Lemma 3.2 in NEURIPS2024_1f28e934
  • Theorem 5: Theorem 5.3 in xu2025efficient
  • Lemma 6
  • Lemma 7
  • Theorem 8
  • Lemma 9: Proof of Lemma \ref{['lemma: optimality and feasibility of F policy']}
  • Lemma 10: Restatement of Lemma \ref{['lemma: Q function derivation']}
  • ...and 3 more