Performance Improvement Bounds for Lipschitz Configurable Markov Decision Processes

Alberto Maria Metelli

Performance Improvement Bounds for Lipschitz Configurable Markov Decision Processes

Alberto Maria Metelli

TL;DR

This paper provides a bound on the Wasserstein distance between $\gamma$-discounted stationary distributions induced by changing policy and configuration and derives a novel performance improvement lower bound for Conf-MDP.

Abstract

Configurable Markov Decision Processes (Conf-MDPs) have recently been introduced as an extension of the traditional Markov Decision Processes (MDPs) to model the real-world scenarios in which there is the possibility to intervene in the environment in order to configure some of its parameters. In this paper, we focus on a particular subclass of Conf-MDP that satisfies regularity conditions, namely Lipschitz continuity. We start by providing a bound on the Wasserstein distance between $γ$-discounted stationary distributions induced by changing policy and configuration. This result generalizes the already existing bounds both for Conf-MDPs and traditional MDPs. Then, we derive a novel performance improvement lower bound.

Performance Improvement Bounds for Lipschitz Configurable Markov Decision Processes

TL;DR

This paper provides a bound on the Wasserstein distance between

-discounted stationary distributions induced by changing policy and configuration and derives a novel performance improvement lower bound for Conf-MDP.

Abstract

-discounted stationary distributions induced by changing policy and configuration. This result generalizes the already existing bounds both for Conf-MDPs and traditional MDPs. Then, we derive a novel performance improvement lower bound.

Paper Structure (25 sections, 14 theorems, 64 equations)

This paper contains 25 sections, 14 theorems, 64 equations.

Introduction
Preliminaries
Mathematical Background
Probability
Lipschitz Continuity
Wasserstein Distance
Configurable Markov Decision Processes
Value Functions
$\gamma$-discounted Stationary Distributions
Lipschitz Configurable Markov Decision Processes
Lipschitz semi-norms of the Value functions
Bound on the $\gamma$-discounted Stationary Distribution
Coupled Bound
Decoupled Bound
Comparison with Existing Bounds
...and 10 more sections

Key Result

Lemma 1

Let $\mathcal{C}$ be an $L_r$-LC Conf-MDP, $p \in \Delta^{\mathcal{S}}_{\mathcal{S}\times\mathcal{A}}$ be an $L_p$-LC configuration, and $\pi \in \Delta^{\mathcal{A}}_{\mathcal{S}}$ be an $L_\pi$-LC policy. Then, the state-action-next-state value function $\UpiP$ is LC, under the assumption that $\g

Theorems & Definitions (24)

Lemma 1
proof
Theorem 4.1
proof
Corollary 1
Lemma 2: Lemma A.1 of metelli2018configurable
Theorem 6.1: Performance Difference Lemma - Theorem 3.1 of metelli2018configurable
Theorem 6.2: Coupled Bound
proof
Theorem 6.3: Decoupled Bound for Lipschitz Conf-MDPs
...and 14 more

Performance Improvement Bounds for Lipschitz Configurable Markov Decision Processes

TL;DR

Abstract

Performance Improvement Bounds for Lipschitz Configurable Markov Decision Processes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (24)