Bounding the Difference between the Values of Robust and Non-Robust Markov Decision Problems
Ariel Neufeld, Julian Sester
TL;DR
The paper analyzes the gap between the value of a distributionally robust Markov decision process (MDP) and the value of a non-robust MDP when the ambiguity set for transition kernels is a $q$-Wasserstein ball around a reference kernel. It develops a dynamic-programming framework showing that, under Lipschitz and regularity assumptions, the difference between the true robust value $V^{\operatorname{true}}$ and the robust value $V$ is bounded by a dimension-free expression that scales linearly with the Wasserstein radius $\varepsilon$ and depends on the Lipschitz constants $L_r$ and $L_P$ and the discount factor $\alpha$, specifically $0 \le V^{\operatorname{true}}(x_0) - V(x_0) \le 2 L_r \varepsilon (1+\alpha) \sum_{i=0}^{\infty} \alpha^{i} \sum_{j=0}^{i} (L_P)^j$, with a tighter bound when the true kernel coincides with the reference $\widehat{\mathbb{P}}$. The results are dimension-free and extend to autocorrelated time series via state augmentation. The proofs combine DP operators, Lipschitz bounds, and optimal couplings to control the propagation of kernel uncertainty through the value function.
Abstract
In this note we provide an upper bound for the difference between the value function of a distributionally robust Markov decision problem and the value function of a non-robust Markov decision problem, where the ambiguity set of probability kernels of the distributionally robust Markov decision process is described by a Wasserstein-ball around some reference kernel whereas the non-robust Markov decision process behaves according to a fixed probability kernel contained in the ambiguity set. Our derived upper bound for the difference between the value functions is dimension-free and depends linearly on the radius of the Wasserstein-ball.
