Dual Formulation for Non-Rectangular Lp Robust Markov Decision Processes
Navdeep Kumar, Adarsh Gupta, Maxence Mohamed Elfatihi, Giorgia Ramponi, Kfir Yehuda Levy, Shie Mannor
TL;DR
This work tackles non-rectangular robust MDPs with $L_p$-bounded kernel uncertainty, showing that while general non-rectangular policy evaluation is NP-hard, the class around $\mathcal{U}_p$ can be decomposed into infinite sa-rectangular sets, enabling a novel dual formulation. The authors derive a dual representation for robust MDPs, reveal that the adversary's worst-case kernel is always a rank-one perturbation, and propose robust policy evaluation via a fixed-point binary search that achieves linear convergence. They further develop policy gradient methods leveraging the dual structure, provide a practical $p=2$ spectral algorithm with favorable complexity, and validate the approach with experiments that outperform brute-force baselines. The results offer a promising foundation for scalable robust RL under non-rectangular uncertainty and open avenues for extending to broader uncertainty sets and deep RL integration.
Abstract
We study robust Markov decision processes (RMDPs) with non-rectangular uncertainty sets, which capture interdependencies across states unlike traditional rectangular models. While non-rectangular robust policy evaluation is generally NP-hard, even in approximation, we identify a powerful class of $L_p$-bounded uncertainty sets that avoid these complexity barriers due to their structural simplicity. We further show that this class can be decomposed into infinitely many \texttt{sa}-rectangular $L_p$-bounded sets and leverage its structural properties to derive a novel dual formulation for $L_p$ RMDPs. This formulation provides key insights into the adversary's strategy and enables the development of the first robust policy evaluation algorithms for non-rectangular RMDPs. Empirical results demonstrate that our approach significantly outperforms brute-force methods, establishing a promising foundation for future investigation into non-rectangular robust MDPs.
