Table of Contents
Fetching ...

Distributionally Robust Safety Verification for Markov Decision Processes

Abhijit Mazumdar, Yuting Hou, Rafal Wisniewski

TL;DR

This paper obtains an upper bound on the robust safety function in terms of a distributionally robust Q-function in terms of a convex program-based distributionally robust Q-iteration algorithm to compute the robust Q-function.

Abstract

In this paper, we propose a distributionally robust safety verification method for Markov decision processes where only an ambiguous transition kernel is available instead of the precise transition kernel. We define the ambiguity set around the nominal distribution by considering a Wasserstein distance. To this end, we introduce a robust safety function to characterize probabilistic safety in the face of uncertain transition probability. First, we obtain an upper bound on the robust safety function in terms of a distributionally robust Q-function. Then, we present a convex program-based distributionally robust Q-iteration algorithm to compute the robust Q-function. By considering a numerical example, we demonstrate our theoretical results.

Distributionally Robust Safety Verification for Markov Decision Processes

TL;DR

This paper obtains an upper bound on the robust safety function in terms of a distributionally robust Q-function in terms of a convex program-based distributionally robust Q-iteration algorithm to compute the robust Q-function.

Abstract

In this paper, we propose a distributionally robust safety verification method for Markov decision processes where only an ambiguous transition kernel is available instead of the precise transition kernel. We define the ambiguity set around the nominal distribution by considering a Wasserstein distance. To this end, we introduce a robust safety function to characterize probabilistic safety in the face of uncertain transition probability. First, we obtain an upper bound on the robust safety function in terms of a distributionally robust Q-function. Then, we present a convex program-based distributionally robust Q-iteration algorithm to compute the robust Q-function. By considering a numerical example, we demonstrate our theoretical results.

Paper Structure

This paper contains 7 sections, 7 theorems, 25 equations, 1 figure, 1 table, 1 algorithm.

Key Result

Lemma 1

The robust safety function $S^{\delta,\mathscr{P}}_{\pi}(x)$ can be expressed as follows: where, $\tau = \tau_{E\cup U}$ and $\kappa^{\tilde{{\mathscr{P}}}}(x,a)= \sum_{y \in U} \tilde{{P}}_{x,a}(y)$.

Figures (1)

  • Figure 2: Example MDP Diagram with Goal and Forbidden States

Theorems & Definitions (18)

  • Definition 1: Safety function
  • Definition 2: Robust safety function
  • Definition 3: Robust $p$-safety
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Definition 4
  • Lemma 3
  • proof
  • ...and 8 more