Initial Distribution Sensitivity of Constrained Markov Decision Processes
Alperen Tercan, Necmiye Ozay
TL;DR
This work addresses how CMDP performance depends on the initial state distribution by developing three bounds on $V^*(\beta)$ using $\text{LP}$ duality, perturbation theory, and concavity of LP values. The bounds enable assessing robustness and deriving inner approximations to $(0,\epsilon)$-regret sets without re-solving the CMDP for every $\beta$, and they are validated on random CMDPs and a water-pendulum example. The key contributions include a practical duality-based bound, a perturbation-based bound with both upper and lower guarantees, and a concavity-based bound, plus demonstrations of robustness analysis and minimal-regret computation over distribution sets. These results offer efficient tools for planning under initial-distribution uncertainty and for designing policies that remain near-optimal as $\beta$ varies, with potential extensions to uncertain transition dynamics.
Abstract
Constrained Markov Decision Processes (CMDPs) are notably more complex to solve than standard MDPs due to the absence of universally optimal policies across all initial state distributions. This necessitates re-solving the CMDP whenever the initial distribution changes. In this work, we analyze how the optimal value of CMDPs varies with different initial distributions, deriving bounds on these variations using duality analysis of CMDPs and perturbation analysis in linear programming. Moreover, we show how such bounds can be used to analyze the regret of a given policy due to unknown variations of the initial distribution.
