Bounding the Difference between the Values of Robust and Non-Robust Markov Decision Problems

Ariel Neufeld; Julian Sester

Bounding the Difference between the Values of Robust and Non-Robust Markov Decision Problems

Ariel Neufeld, Julian Sester

TL;DR

The paper analyzes the gap between the value of a distributionally robust Markov decision process (MDP) and the value of a non-robust MDP when the ambiguity set for transition kernels is a $q$-Wasserstein ball around a reference kernel. It develops a dynamic-programming framework showing that, under Lipschitz and regularity assumptions, the difference between the true robust value $V^{\operatorname{true}}$ and the robust value $V$ is bounded by a dimension-free expression that scales linearly with the Wasserstein radius $\varepsilon$ and depends on the Lipschitz constants $L_r$ and $L_P$ and the discount factor $\alpha$, specifically $0 \le V^{\operatorname{true}}(x_0) - V(x_0) \le 2 L_r \varepsilon (1+\alpha) \sum_{i=0}^{\infty} \alpha^{i} \sum_{j=0}^{i} (L_P)^j$, with a tighter bound when the true kernel coincides with the reference $\widehat{\mathbb{P}}$. The results are dimension-free and extend to autocorrelated time series via state augmentation. The proofs combine DP operators, Lipschitz bounds, and optimal couplings to control the propagation of kernel uncertainty through the value function.

Abstract

In this note we provide an upper bound for the difference between the value function of a distributionally robust Markov decision problem and the value function of a non-robust Markov decision problem, where the ambiguity set of probability kernels of the distributionally robust Markov decision process is described by a Wasserstein-ball around some reference kernel whereas the non-robust Markov decision process behaves according to a fixed probability kernel contained in the ambiguity set. Our derived upper bound for the difference between the value functions is dimension-free and depends linearly on the radius of the Wasserstein-ball.

Bounding the Difference between the Values of Robust and Non-Robust Markov Decision Problems

TL;DR

The paper analyzes the gap between the value of a distributionally robust Markov decision process (MDP) and the value of a non-robust MDP when the ambiguity set for transition kernels is a

-Wasserstein ball around a reference kernel. It develops a dynamic-programming framework showing that, under Lipschitz and regularity assumptions, the difference between the true robust value

and the robust value

is bounded by a dimension-free expression that scales linearly with the Wasserstein radius

and depends on the Lipschitz constants

and

and the discount factor

, specifically

, with a tighter bound when the true kernel coincides with the reference

. The results are dimension-free and extend to autocorrelated time series via state augmentation. The proofs combine DP operators, Lipschitz bounds, and optimal couplings to control the propagation of kernel uncertainty through the value function.

Abstract

Paper Structure (8 sections, 5 theorems, 31 equations, 1 figure)

This paper contains 8 sections, 5 theorems, 31 equations, 1 figure.

Introduction
Setting
Setting
Problem Formulation and Standing Assumptions
Main Result
Proof of the main result
Auxiliary Results
Proof of Theorem \ref{['thm_main_result']}

Key Result

Theorem 3.1

Let all Assumptions asu_2-- asu_3 hold true.

Figures (1)

Figure 1: The difference between the non-robust and the robust value function compared with the upper bound from \ref{['eq_bound_main_thm_1']} in the setting described in Example \ref{['exa_toin_coss']} in dependence of $\varepsilon>0$ and for different initial values of the MDP. Initial values larger than $5$ are omitted due to the setting-specific symmetry $V(x_0)-V^{\rm true}(x_0) = V(10-x_0)-V^{\rm true}(10-x_0)$ for $x_0\in \{0,1,\dots,10\}$.

Theorems & Definitions (11)

Theorem 3.1
Remark 3.2
Example 3.3: Coin Toss
Lemma 4.1
proof
Lemma 4.2
proof
Lemma 4.3
proof
Lemma 4.4
...and 1 more

Bounding the Difference between the Values of Robust and Non-Robust Markov Decision Problems

TL;DR

Abstract

Bounding the Difference between the Values of Robust and Non-Robust Markov Decision Problems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (11)