Table of Contents
Fetching ...

Dual Formulation for Non-Rectangular Lp Robust Markov Decision Processes

Navdeep Kumar, Adarsh Gupta, Maxence Mohamed Elfatihi, Giorgia Ramponi, Kfir Yehuda Levy, Shie Mannor

TL;DR

This work tackles non-rectangular robust MDPs with $L_p$-bounded kernel uncertainty, showing that while general non-rectangular policy evaluation is NP-hard, the class around $\mathcal{U}_p$ can be decomposed into infinite sa-rectangular sets, enabling a novel dual formulation. The authors derive a dual representation for robust MDPs, reveal that the adversary's worst-case kernel is always a rank-one perturbation, and propose robust policy evaluation via a fixed-point binary search that achieves linear convergence. They further develop policy gradient methods leveraging the dual structure, provide a practical $p=2$ spectral algorithm with favorable complexity, and validate the approach with experiments that outperform brute-force baselines. The results offer a promising foundation for scalable robust RL under non-rectangular uncertainty and open avenues for extending to broader uncertainty sets and deep RL integration.

Abstract

We study robust Markov decision processes (RMDPs) with non-rectangular uncertainty sets, which capture interdependencies across states unlike traditional rectangular models. While non-rectangular robust policy evaluation is generally NP-hard, even in approximation, we identify a powerful class of $L_p$-bounded uncertainty sets that avoid these complexity barriers due to their structural simplicity. We further show that this class can be decomposed into infinitely many \texttt{sa}-rectangular $L_p$-bounded sets and leverage its structural properties to derive a novel dual formulation for $L_p$ RMDPs. This formulation provides key insights into the adversary's strategy and enables the development of the first robust policy evaluation algorithms for non-rectangular RMDPs. Empirical results demonstrate that our approach significantly outperforms brute-force methods, establishing a promising foundation for future investigation into non-rectangular robust MDPs.

Dual Formulation for Non-Rectangular Lp Robust Markov Decision Processes

TL;DR

This work tackles non-rectangular robust MDPs with -bounded kernel uncertainty, showing that while general non-rectangular policy evaluation is NP-hard, the class around can be decomposed into infinite sa-rectangular sets, enabling a novel dual formulation. The authors derive a dual representation for robust MDPs, reveal that the adversary's worst-case kernel is always a rank-one perturbation, and propose robust policy evaluation via a fixed-point binary search that achieves linear convergence. They further develop policy gradient methods leveraging the dual structure, provide a practical spectral algorithm with favorable complexity, and validate the approach with experiments that outperform brute-force baselines. The results offer a promising foundation for scalable robust RL under non-rectangular uncertainty and open avenues for extending to broader uncertainty sets and deep RL integration.

Abstract

We study robust Markov decision processes (RMDPs) with non-rectangular uncertainty sets, which capture interdependencies across states unlike traditional rectangular models. While non-rectangular robust policy evaluation is generally NP-hard, even in approximation, we identify a powerful class of -bounded uncertainty sets that avoid these complexity barriers due to their structural simplicity. We further show that this class can be decomposed into infinitely many \texttt{sa}-rectangular -bounded sets and leverage its structural properties to derive a novel dual formulation for RMDPs. This formulation provides key insights into the adversary's strategy and enables the development of the first robust policy evaluation algorithms for non-rectangular RMDPs. Empirical results demonstrate that our approach significantly outperforms brute-force methods, establishing a promising foundation for future investigation into non-rectangular robust MDPs.

Paper Structure

This paper contains 56 sections, 37 theorems, 98 equations, 20 figures, 3 tables, 5 algorithms.

Key Result

Proposition 2.1

(Nature of the Adversary,LpPgRMDP) For uncertainty set $\mathcal{U} =\mathcal{U}^{sa}_p/\mathcal{U}^s_p$, the worst kernel is given as where $k$ depends on the robust value function $v^\pi_\mathcal{U}$ and $b$ is (policy weighted of $\mathcal{U}^{\texttt{s}}_p$) radius vector.

Figures (20)

  • Figure 1: Modeling Uncertainty with Non-Rectangular and Rectangular $L_2$-Balls.
  • Figure 2: Illustration of Proposition \ref{['main:rs:Set:sa2nr']}: N-dimensional sphere can be written as infinite union of n-dimenssional inscribing cubes.
  • Figure 3: Projections of set $\mathop{\mathrm{\mathcal{D}}}\nolimits$ along principal components, for $S=3, A=2$ with $10$ millions samples. This figure strongly suggests the non-convexity of the set.
  • Figure 4: Performance of Robust Policy Evaluation methods with equal amount of time, with fixed action space $A=8$
  • Figure 5: Convergence of Robust Policy Evaluation Methods, with fixed $S=128,A=8$.
  • ...and 15 more figures

Theorems & Definitions (57)

  • Proposition 2.1
  • Proposition 3.1
  • Proposition 3.2
  • Lemma 3.3
  • Theorem 3.4
  • Lemma 3.5
  • Theorem 3.6
  • Theorem 4.1: Non-rectangular Worst Kernel
  • Lemma 5.1
  • Theorem 5.2
  • ...and 47 more