Table of Contents
Fetching ...

Bounding the Difference between the Values of Robust and Non-Robust Markov Decision Problems

Ariel Neufeld, Julian Sester

TL;DR

The paper analyzes the gap between the value of a distributionally robust Markov decision process (MDP) and the value of a non-robust MDP when the ambiguity set for transition kernels is a $q$-Wasserstein ball around a reference kernel. It develops a dynamic-programming framework showing that, under Lipschitz and regularity assumptions, the difference between the true robust value $V^{\operatorname{true}}$ and the robust value $V$ is bounded by a dimension-free expression that scales linearly with the Wasserstein radius $\varepsilon$ and depends on the Lipschitz constants $L_r$ and $L_P$ and the discount factor $\alpha$, specifically $0 \le V^{\operatorname{true}}(x_0) - V(x_0) \le 2 L_r \varepsilon (1+\alpha) \sum_{i=0}^{\infty} \alpha^{i} \sum_{j=0}^{i} (L_P)^j$, with a tighter bound when the true kernel coincides with the reference $\widehat{\mathbb{P}}$. The results are dimension-free and extend to autocorrelated time series via state augmentation. The proofs combine DP operators, Lipschitz bounds, and optimal couplings to control the propagation of kernel uncertainty through the value function.

Abstract

In this note we provide an upper bound for the difference between the value function of a distributionally robust Markov decision problem and the value function of a non-robust Markov decision problem, where the ambiguity set of probability kernels of the distributionally robust Markov decision process is described by a Wasserstein-ball around some reference kernel whereas the non-robust Markov decision process behaves according to a fixed probability kernel contained in the ambiguity set. Our derived upper bound for the difference between the value functions is dimension-free and depends linearly on the radius of the Wasserstein-ball.

Bounding the Difference between the Values of Robust and Non-Robust Markov Decision Problems

TL;DR

The paper analyzes the gap between the value of a distributionally robust Markov decision process (MDP) and the value of a non-robust MDP when the ambiguity set for transition kernels is a -Wasserstein ball around a reference kernel. It develops a dynamic-programming framework showing that, under Lipschitz and regularity assumptions, the difference between the true robust value and the robust value is bounded by a dimension-free expression that scales linearly with the Wasserstein radius and depends on the Lipschitz constants and and the discount factor , specifically , with a tighter bound when the true kernel coincides with the reference . The results are dimension-free and extend to autocorrelated time series via state augmentation. The proofs combine DP operators, Lipschitz bounds, and optimal couplings to control the propagation of kernel uncertainty through the value function.

Abstract

In this note we provide an upper bound for the difference between the value function of a distributionally robust Markov decision problem and the value function of a non-robust Markov decision problem, where the ambiguity set of probability kernels of the distributionally robust Markov decision process is described by a Wasserstein-ball around some reference kernel whereas the non-robust Markov decision process behaves according to a fixed probability kernel contained in the ambiguity set. Our derived upper bound for the difference between the value functions is dimension-free and depends linearly on the radius of the Wasserstein-ball.
Paper Structure (8 sections, 5 theorems, 31 equations, 1 figure)

This paper contains 8 sections, 5 theorems, 31 equations, 1 figure.

Key Result

Theorem 3.1

Let all Assumptions asu_2-- asu_3 hold true.

Figures (1)

  • Figure 1: The difference between the non-robust and the robust value function compared with the upper bound from \ref{['eq_bound_main_thm_1']} in the setting described in Example \ref{['exa_toin_coss']} in dependence of $\varepsilon>0$ and for different initial values of the MDP. Initial values larger than $5$ are omitted due to the setting-specific symmetry $V(x_0)-V^{\rm true}(x_0) = V(10-x_0)-V^{\rm true}(10-x_0)$ for $x_0\in \{0,1,\dots,10\}$.

Theorems & Definitions (11)

  • Theorem 3.1
  • Remark 3.2
  • Example 3.3: Coin Toss
  • Lemma 4.1
  • proof
  • Lemma 4.2
  • proof
  • Lemma 4.3
  • proof
  • Lemma 4.4
  • ...and 1 more