Table of Contents
Fetching ...

Computing the Exact Pareto Front in Average-Cost Multi-Objective Markov Decision Processes

Jiping Luo, Nikolaos Pappas

Abstract

Many communication and control problems are cast as multi-objective Markov decision processes (MOMDPs). The complete solution to an MOMDP is the Pareto front. Much of the literature approximates this front via scalarization into single-objective MDPs. Recent work has begun to characterize the full front in discounted or simple bi-objective settings by exploiting its geometry. In this work, we characterize the exact front in average-cost MOMDPs. We show that the front is a continuous, piecewise-linear surface lying on the boundary of a convex polytope. Each vertex corresponds to a deterministic policy, and adjacent vertices differ in exactly one state. Each edge is realized as a convex combination of the policies at its endpoints, with the mixing coefficient given in closed form. We apply these results to a remote state estimation problem, where each vertex on the front corresponds to a threshold policy. The exact Pareto front and solutions to certain non-convex MDPs can be obtained without explicitly solving any MDP.

Computing the Exact Pareto Front in Average-Cost Multi-Objective Markov Decision Processes

Abstract

Many communication and control problems are cast as multi-objective Markov decision processes (MOMDPs). The complete solution to an MOMDP is the Pareto front. Much of the literature approximates this front via scalarization into single-objective MDPs. Recent work has begun to characterize the full front in discounted or simple bi-objective settings by exploiting its geometry. In this work, we characterize the exact front in average-cost MOMDPs. We show that the front is a continuous, piecewise-linear surface lying on the boundary of a convex polytope. Each vertex corresponds to a deterministic policy, and adjacent vertices differ in exactly one state. Each edge is realized as a convex combination of the policies at its endpoints, with the mixing coefficient given in closed form. We apply these results to a remote state estimation problem, where each vertex on the front corresponds to a threshold policy. The exact Pareto front and solutions to certain non-convex MDPs can be obtained without explicitly solving any MDP.

Paper Structure

This paper contains 17 sections, 10 theorems, 60 equations, 3 figures.

Key Result

Lemma 1

$\Gamma_{\beta}^\pi$ is non-empty for all $\pi \in \Pi$. The largest and smallest subsequential limits are Moreover, the ordinary limit $\lim_{t\to\infty}\mu_{\beta,\pi}^t$ exists if and only if $\Gamma_{\beta}^\pi$ is a singleton, i.e., $\sup(\Gamma_{\beta}^\pi) = \inf(\Gamma_{\beta}^\pi) = \mu_{\beta, \pi}$. $\blacktriangleleft$$\blacktriangleleft$

Figures (3)

  • Figure C1: Remote state estimation of a linear Gaussian process.
  • Figure D1: Pareto front of the estimation system.
  • Figure D2: Achievable total cost for the nonlinear scalarized problem.

Theorems & Definitions (17)

  • Definition 1
  • Definition 2
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Definition 3
  • Theorem 1
  • Lemma 4
  • Theorem 2
  • Definition 4
  • ...and 7 more