Table of Contents
Fetching ...

Computational Hardness of Static Distributionally Robust Markov Decision Processes

Yan Li

TL;DR

The paper analyzes the static distributionally robust MDP with a two-kernel ambiguity set, proving that finding optimal policies is NP-hard for non-randomized Markovian policies and that the robust objective can exhibit suboptimal strict local minima under randomized Markovian policies. Robust evaluation for a fixed policy remains polynomial-time since it reduces to solving two linear systems, but the overall min-max optimization is intractable due to non-rectangularity and combinatorial reductions to partition problems. The authors contrast the static formulation with a dynamic, history-dependent variant and show that dynamic programming re-emerges only under a rectangularized, convex ambiguity set. They also discuss extensions to infinite-horizon discounting and larger kernel sets, connecting hardness results to SAT reductions and highlighting open questions for other criteria.

Abstract

We present some hardness results on finding the optimal policy for the static formulation of distributionally robust Markov decision processes. We construct problem instances such that when the considered policy class is Markovian and non-randomized, finding the optimal policy is NP-hard, and when the considered policy class is Markovian and randomized, the robust value function possesses sub-optimal strict local minima. The considered hard instances involve an ambiguity set with only two transition kernels.

Computational Hardness of Static Distributionally Robust Markov Decision Processes

TL;DR

The paper analyzes the static distributionally robust MDP with a two-kernel ambiguity set, proving that finding optimal policies is NP-hard for non-randomized Markovian policies and that the robust objective can exhibit suboptimal strict local minima under randomized Markovian policies. Robust evaluation for a fixed policy remains polynomial-time since it reduces to solving two linear systems, but the overall min-max optimization is intractable due to non-rectangularity and combinatorial reductions to partition problems. The authors contrast the static formulation with a dynamic, history-dependent variant and show that dynamic programming re-emerges only under a rectangularized, convex ambiguity set. They also discuss extensions to infinite-horizon discounting and larger kernel sets, connecting hardness results to SAT reductions and highlighting open questions for other criteria.

Abstract

We present some hardness results on finding the optimal policy for the static formulation of distributionally robust Markov decision processes. We construct problem instances such that when the considered policy class is Markovian and non-randomized, finding the optimal policy is NP-hard, and when the considered policy class is Markovian and randomized, the robust value function possesses sub-optimal strict local minima. The considered hard instances involve an ambiguity set with only two transition kernels.

Paper Structure

This paper contains 4 sections, 2 theorems, 22 equations.

Key Result

Theorem 2.1

The robust MDP static_rmdp_two_kernels is NP-hard when $\Pi = \Pi_{\mathrm{MD}}$.

Theorems & Definitions (4)

  • Remark 1.1
  • Theorem 2.1
  • Theorem 2.2
  • Remark 2.1