Dual Formulation for Non-Rectangular Lp Robust Markov Decision Processes

Navdeep Kumar; Adarsh Gupta; Maxence Mohamed Elfatihi; Giorgia Ramponi; Kfir Yehuda Levy; Shie Mannor

Dual Formulation for Non-Rectangular Lp Robust Markov Decision Processes

Navdeep Kumar, Adarsh Gupta, Maxence Mohamed Elfatihi, Giorgia Ramponi, Kfir Yehuda Levy, Shie Mannor

TL;DR

This work tackles non-rectangular robust MDPs with $L_p$-bounded kernel uncertainty, showing that while general non-rectangular policy evaluation is NP-hard, the class around $\mathcal{U}_p$ can be decomposed into infinite sa-rectangular sets, enabling a novel dual formulation. The authors derive a dual representation for robust MDPs, reveal that the adversary's worst-case kernel is always a rank-one perturbation, and propose robust policy evaluation via a fixed-point binary search that achieves linear convergence. They further develop policy gradient methods leveraging the dual structure, provide a practical $p=2$ spectral algorithm with favorable complexity, and validate the approach with experiments that outperform brute-force baselines. The results offer a promising foundation for scalable robust RL under non-rectangular uncertainty and open avenues for extending to broader uncertainty sets and deep RL integration.

Abstract

We study robust Markov decision processes (RMDPs) with non-rectangular uncertainty sets, which capture interdependencies across states unlike traditional rectangular models. While non-rectangular robust policy evaluation is generally NP-hard, even in approximation, we identify a powerful class of $L_p$-bounded uncertainty sets that avoid these complexity barriers due to their structural simplicity. We further show that this class can be decomposed into infinitely many \texttt{sa}-rectangular $L_p$-bounded sets and leverage its structural properties to derive a novel dual formulation for $L_p$ RMDPs. This formulation provides key insights into the adversary's strategy and enables the development of the first robust policy evaluation algorithms for non-rectangular RMDPs. Empirical results demonstrate that our approach significantly outperforms brute-force methods, establishing a promising foundation for future investigation into non-rectangular robust MDPs.

Dual Formulation for Non-Rectangular Lp Robust Markov Decision Processes

TL;DR

This work tackles non-rectangular robust MDPs with

-bounded kernel uncertainty, showing that while general non-rectangular policy evaluation is NP-hard, the class around

can be decomposed into infinite sa-rectangular sets, enabling a novel dual formulation. The authors derive a dual representation for robust MDPs, reveal that the adversary's worst-case kernel is always a rank-one perturbation, and propose robust policy evaluation via a fixed-point binary search that achieves linear convergence. They further develop policy gradient methods leveraging the dual structure, provide a practical

spectral algorithm with favorable complexity, and validate the approach with experiments that outperform brute-force baselines. The results offer a promising foundation for scalable robust RL under non-rectangular uncertainty and open avenues for extending to broader uncertainty sets and deep RL integration.

Abstract

-bounded uncertainty sets that avoid these complexity barriers due to their structural simplicity. We further show that this class can be decomposed into infinitely many \texttt{sa}-rectangular

-bounded sets and leverage its structural properties to derive a novel dual formulation for

RMDPs. This formulation provides key insights into the adversary's strategy and enables the development of the first robust policy evaluation algorithms for non-rectangular RMDPs. Empirical results demonstrate that our approach significantly outperforms brute-force methods, establishing a promising foundation for future investigation into non-rectangular robust MDPs.

Dual Formulation for Non-Rectangular Lp Robust Markov Decision Processes

TL;DR

Abstract

Dual Formulation for Non-Rectangular Lp Robust Markov Decision Processes

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (20)

Theorems & Definitions (57)