Table of Contents
Fetching ...

Initial Distribution Sensitivity of Constrained Markov Decision Processes

Alperen Tercan, Necmiye Ozay

TL;DR

This work addresses how CMDP performance depends on the initial state distribution by developing three bounds on $V^*(\beta)$ using $\text{LP}$ duality, perturbation theory, and concavity of LP values. The bounds enable assessing robustness and deriving inner approximations to $(0,\epsilon)$-regret sets without re-solving the CMDP for every $\beta$, and they are validated on random CMDPs and a water-pendulum example. The key contributions include a practical duality-based bound, a perturbation-based bound with both upper and lower guarantees, and a concavity-based bound, plus demonstrations of robustness analysis and minimal-regret computation over distribution sets. These results offer efficient tools for planning under initial-distribution uncertainty and for designing policies that remain near-optimal as $\beta$ varies, with potential extensions to uncertain transition dynamics.

Abstract

Constrained Markov Decision Processes (CMDPs) are notably more complex to solve than standard MDPs due to the absence of universally optimal policies across all initial state distributions. This necessitates re-solving the CMDP whenever the initial distribution changes. In this work, we analyze how the optimal value of CMDPs varies with different initial distributions, deriving bounds on these variations using duality analysis of CMDPs and perturbation analysis in linear programming. Moreover, we show how such bounds can be used to analyze the regret of a given policy due to unknown variations of the initial distribution.

Initial Distribution Sensitivity of Constrained Markov Decision Processes

TL;DR

This work addresses how CMDP performance depends on the initial state distribution by developing three bounds on using duality, perturbation theory, and concavity of LP values. The bounds enable assessing robustness and deriving inner approximations to -regret sets without re-solving the CMDP for every , and they are validated on random CMDPs and a water-pendulum example. The key contributions include a practical duality-based bound, a perturbation-based bound with both upper and lower guarantees, and a concavity-based bound, plus demonstrations of robustness analysis and minimal-regret computation over distribution sets. These results offer efficient tools for planning under initial-distribution uncertainty and for designing policies that remain near-optimal as varies, with potential extensions to uncertain transition dynamics.

Abstract

Constrained Markov Decision Processes (CMDPs) are notably more complex to solve than standard MDPs due to the absence of universally optimal policies across all initial state distributions. This necessitates re-solving the CMDP whenever the initial distribution changes. In this work, we analyze how the optimal value of CMDPs varies with different initial distributions, deriving bounds on these variations using duality analysis of CMDPs and perturbation analysis in linear programming. Moreover, we show how such bounds can be used to analyze the regret of a given policy due to unknown variations of the initial distribution.

Paper Structure

This paper contains 16 sections, 8 theorems, 35 equations, 1 figure, 3 tables.

Key Result

Theorem 1

Let $\beta_0$ be the nominal initial distribution the CMDP is solved for; with an optimal dual solution $\lambda^*({\beta_0})$ and $W^*(\beta_0)$. For any initial distribution $\beta_1$, the value $V^*(\beta_1)$ satisfies:

Figures (1)

  • Figure 1: Water pendulum. The objective is to stabilize the pendulum in the upright position, perpendicular to the surface, with both angular position $\theta$ and angular velocity $\dot{\theta}$ equal to zero. The blue region represents water.

Theorems & Definitions (16)

  • Definition 1
  • Theorem 1
  • proof
  • Remark 1
  • Theorem 2
  • Theorem 3
  • proof
  • Remark 2
  • Theorem 4
  • Theorem 5
  • ...and 6 more