Yet Another Distributional Bellman Equation
Nicole Bäuerle, Tamara Göll, Anna Jaśkiewicz
TL;DR
This work develops a general framework for Markov Decision Processes where the objective is a functional of the joint distribution of the terminal state and accumulated reward, rather than a simple expectation. By lifting the problem to the space of distributions $P(E\times S)$ and introducing a distributional Bellman operator, the authors derive a recursive dynamic program that uniformly handles classical MDPs, quantile criteria, risk-sensitive criteria, and optimal transport-style objectives. They establish existence and structure results for finite-horizon problems, extend to infinite horizon with conditions ensuring existence and approximability, and demonstrate that classical MDPs and quantile optimization are natural special cases. The framework also yields valuable applications, including OT-type problems and portfolio/risk-shaping scenarios, highlighting a flexible tool for shaping reward distributions in decision problems.
Abstract
We consider non-standard Markov Decision Processes (MDPs) where the target function is not only a simple expectation of the accumulated reward. Instead, we consider rather general functionals of the joint distribution of terminal state and accumulated reward which have to be optimized. For finite state and compact action space, we show how to solve these problems by defining a lifted MDP whose state space is the space of distributions over the true states of the process. We derive a Bellman equation in this setting, which can be considered as a distributional Bellman equation. Well-known cases like the standard MDP and quantile MDPs are shown to be special examples of our framework. We also apply our model to a variant of an optimal transport problem.
