Table of Contents
Fetching ...

Yet Another Distributional Bellman Equation

Nicole Bäuerle, Tamara Göll, Anna Jaśkiewicz

TL;DR

This work develops a general framework for Markov Decision Processes where the objective is a functional of the joint distribution of the terminal state and accumulated reward, rather than a simple expectation. By lifting the problem to the space of distributions $P(E\times S)$ and introducing a distributional Bellman operator, the authors derive a recursive dynamic program that uniformly handles classical MDPs, quantile criteria, risk-sensitive criteria, and optimal transport-style objectives. They establish existence and structure results for finite-horizon problems, extend to infinite horizon with conditions ensuring existence and approximability, and demonstrate that classical MDPs and quantile optimization are natural special cases. The framework also yields valuable applications, including OT-type problems and portfolio/risk-shaping scenarios, highlighting a flexible tool for shaping reward distributions in decision problems.

Abstract

We consider non-standard Markov Decision Processes (MDPs) where the target function is not only a simple expectation of the accumulated reward. Instead, we consider rather general functionals of the joint distribution of terminal state and accumulated reward which have to be optimized. For finite state and compact action space, we show how to solve these problems by defining a lifted MDP whose state space is the space of distributions over the true states of the process. We derive a Bellman equation in this setting, which can be considered as a distributional Bellman equation. Well-known cases like the standard MDP and quantile MDPs are shown to be special examples of our framework. We also apply our model to a variant of an optimal transport problem.

Yet Another Distributional Bellman Equation

TL;DR

This work develops a general framework for Markov Decision Processes where the objective is a functional of the joint distribution of the terminal state and accumulated reward, rather than a simple expectation. By lifting the problem to the space of distributions and introducing a distributional Bellman operator, the authors derive a recursive dynamic program that uniformly handles classical MDPs, quantile criteria, risk-sensitive criteria, and optimal transport-style objectives. They establish existence and structure results for finite-horizon problems, extend to infinite horizon with conditions ensuring existence and approximability, and demonstrate that classical MDPs and quantile optimization are natural special cases. The framework also yields valuable applications, including OT-type problems and portfolio/risk-shaping scenarios, highlighting a flexible tool for shaping reward distributions in decision problems.

Abstract

We consider non-standard Markov Decision Processes (MDPs) where the target function is not only a simple expectation of the accumulated reward. Instead, we consider rather general functionals of the joint distribution of terminal state and accumulated reward which have to be optimized. For finite state and compact action space, we show how to solve these problems by defining a lifted MDP whose state space is the space of distributions over the true states of the process. We derive a Bellman equation in this setting, which can be considered as a distributional Bellman equation. Well-known cases like the standard MDP and quantile MDPs are shown to be special examples of our framework. We also apply our model to a variant of an optimal transport problem.

Paper Structure

This paper contains 10 sections, 5 theorems, 69 equations, 3 figures, 1 algorithm.

Key Result

Proposition 2.2

For every policy $\sigma=(\sigma_n)_{n=0}^{N-1}$ in the original MDP model, there exists an action sequence $(\pi_0,\ldots,\pi_{N-1})$ in the lifted MDP such that where $F_0= \nu \otimes \delta_0$ and $T^\pi$ is the operator defined in eq:Toperator.

Figures (3)

  • Figure 1: Average Wasserstein distance of $F_N$ from Algorithm \ref{['alg:cap']} and the target distribution $G$ (rescaled normal distribution) over $m=100$ initial samples for $\sigma\in \{0.5,1,2,5\}$ and $K\in \{50,100\}$.
  • Figure 2: Boxplots of the Wasserstein distance of $F_n$ from Algorithm \ref{['alg:cap']} and the target distribution $G$ (rescaled normal distribution) using the data from the $m=100$ initial samples for $\sigma\in \{0.5,1,2,5\}$.
  • Figure 3: Average Wasserstein distance of $F_N$ from Algorithm \ref{['alg:cap']} and the target distribution $G$ (rescaled shifted exponential distribution) over $m=100$ initial samples for $\lambda\in \{0.5,1,2,5\}$ and $K\in \{50,100\}$.

Theorems & Definitions (15)

  • Example 2.1
  • Proposition 2.2
  • proof
  • Remark 2.3
  • Theorem 2.4
  • proof
  • Remark 2.5
  • Theorem 3.1
  • proof
  • Example 4.1
  • ...and 5 more