Efficient Algorithms for Robust Markov Decision Processes with $s$-Rectangular Ambiguity Sets

Chin Pang Ho; Marek Petrik; Wolfram Wiesemann

Efficient Algorithms for Robust Markov Decision Processes with $s$-Rectangular Ambiguity Sets

Chin Pang Ho, Marek Petrik, Wolfram Wiesemann

TL;DR

The paper presents a unified framework for solving robust MDPs with $s$-rectangular ambiguity sets by reducing robust Bellman updates to structured projection subproblems. It provides exact efficient algorithms for weighted $1$- and $2$-norm ambiguity, and scalable approximate methods for common $\phi$-divergence sets (KL and Burg entropy), with clear complexity guarantees. The approach yields substantial practical speedups over state-of-the-art solvers across synthetic and benchmark MDPs, enabling scalable robust planning while preserving performance. The work integrates theoretical developments with comprehensive numerical experiments and demonstrates broad applicability to distributional robustness in dynamic programming contexts.

Abstract

Robust Markov decision processes (MDPs) have attracted significant interest due to their ability to protect MDPs from poor out-of-sample performance in the presence of ambiguity. In contrast to classical MDPs, which account for stochasticity by modeling the dynamics through a stochastic process with a known transition kernel, a robust MDP additionally accounts for ambiguity by optimizing against the most adverse transition kernel from an ambiguity set constructed via historical data. In this paper, we develop a unified solution framework for a broad class of robust MDPs with $s$-rectangular ambiguity sets, where the most adverse transition probabilities are considered independently for each state. Using our algorithms, we show that $s$-rectangular robust MDPs with $1$- and $2$-norm as well as $φ$-divergence ambiguity sets can be solved several orders of magnitude faster than with state-of-the-art commercial solvers, and often only a logarithmic factor slower than classical MDPs. We demonstrate the favorable scaling properties of our algorithms on a range of synthetically generated as well as standard benchmark instances.

Efficient Algorithms for Robust Markov Decision Processes with $s$-Rectangular Ambiguity Sets

TL;DR

The paper presents a unified framework for solving robust MDPs with

-rectangular ambiguity sets by reducing robust Bellman updates to structured projection subproblems. It provides exact efficient algorithms for weighted

- and

-norm ambiguity, and scalable approximate methods for common

-divergence sets (KL and Burg entropy), with clear complexity guarantees. The approach yields substantial practical speedups over state-of-the-art solvers across synthetic and benchmark MDPs, enabling scalable robust planning while preserving performance. The work integrates theoretical developments with comprehensive numerical experiments and demonstrates broad applicability to distributional robustness in dynamic programming contexts.

Abstract

-rectangular ambiguity sets, where the most adverse transition probabilities are considered independently for each state. Using our algorithms, we show that

-rectangular robust MDPs with

- and

-norm as well as

-divergence ambiguity sets can be solved several orders of magnitude faster than with state-of-the-art commercial solvers, and often only a logarithmic factor slower than classical MDPs. We demonstrate the favorable scaling properties of our algorithms on a range of synthetically generated as well as standard benchmark instances.

Paper Structure (14 sections, 20 theorems, 92 equations, 9 figures, 7 tables, 5 algorithms)

This paper contains 14 sections, 20 theorems, 92 equations, 9 figures, 7 tables, 5 algorithms.

Introduction
Bellman Updates for $s$-Rectangular Robust MDPs
$1$-Norm Ambiguity Sets
$2$-Norm Ambiguity Sets
Solution Set of the Nonlinear Equation \ref{['eq:2norm:nonlinear_genericform']}
Efficient Solution for Equation System \ref{['eq:2norm:nonlinear_eqs']}
$\phi$-Divergence Ambiguity Sets
Kullback-Leibler Divergence
Burg Entropy
Numerical Results
Projection Problems
Robust Bellman Operator
Robust Value Iteration
Use of Large Language Models.

Key Result

Theorem 1

Assume that the generalized $d_a$-projection eq:gen_projection can be computed exactly in time $\mathcal{O} (h (S))$. Then the robust Bellman iteration eq:rob_value_it can be computed for all states $s \in \mathcal{S}$ to accuracy $\epsilon > 0$ in time $\mathcal{O} (A S \cdot h (S) \cdot \log [\ove

Figures (9)

Figure 1: Problem \ref{['eq:gen_projection']} in $S = 3$ dimensions (a) and two-dimensional projections for the 1-norm (b), 2-norm (c) and the Kullback-Leibler divergence (d). The gray shaded areas represent the probability simplex $\Delta_S$, the red dashed lines show the boundaries of the intersections of the halfspaces $\bm{b}^\top \bm{p}_{sa} \leq \beta$ with the probability simplices, and the white shapes illustrate contour lines centered at the nominal transition probabilities $\overline{\bm{p}}_{sa}$.
Figure 2: From left to right: lines $b_i \alpha + \sigma_{sai}$, $i = 1, \ldots, S$, before Step 1 (a); lines $b_{i_j} \alpha + \sigma_{sa i_j}$, $i_1, \ldots, i_m$, after Step 1 (b); lines $b_{j_k} \alpha + \sigma_{sa j_k}$, $j_1, \ldots, j_n$, after Step 2 (c); line segments $b_{j_k} \alpha + \sigma_{sa j_k}$, $j_1, \ldots, j_n$ (possibly relabeled), after removing negative breakpoints $\alpha_j$ (d). Note that $\bm{b} \geq \bm{0}$ and hence all line segments are non-decreasing in this section; we chose to include negative slopes in this figure to aid visual clarity.
Figure 3: For the expression $\min \{ \bm{b} \alpha + \bm{\sigma}_{sa} \}$ plotted in the left graph, Algorithm \ref{['alg:1norm:plus_BkPts']} determines the two additional breakpoints $\alpha^0_1$ and $\alpha^0_2$ in the right graph.
Figure 4: For the three component functions $a_i (- b_i \alpha + \gamma^\star (\alpha) + c_i)$ in solid red, dotted black and dashed green, Algorithm \ref{['alg:2norm:gamma_form']} computes the solution set of equation \ref{['eq:2norm:nonlinear_genericform']} in $5$ iterations with the breakpoints $\alpha_1, \ldots, \alpha_4$. The colored bars below the graph indicate which component function indices $i$ are contained in each set $\mathcal{I}_t$.
Figure 5: Median runtimes (in $\mu\text{s}$) of the projection problems for the $\ell_1$-norm (top left), $\ell_2$-norm (top right), KL divergence (bottom left), and Burg entropy (bottom right).
...and 4 more figures

Theorems & Definitions (39)

Theorem 1
Theorem 2
Proposition 1
proof
proof
Lemma 1
proof
Lemma 2
proof
Theorem 3
...and 29 more

Efficient Algorithms for Robust Markov Decision Processes with $s$-Rectangular Ambiguity Sets

TL;DR

Abstract

Efficient Algorithms for Robust Markov Decision Processes with $s$-Rectangular Ambiguity Sets

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (39)