Table of Contents
Fetching ...

How to Find the Exact Pareto Front for Multi-Objective MDPs?

Yining Li, Peizhong Ju, Ness B. Shroff

TL;DR

This paper addresses exact Pareto-front discovery in multi-objective MDPs by revealing that the front lies on the boundary of a convex polytope whose vertices are deterministic policies and where neighboring vertices differ by a single state-action pair. It then introduces an efficient algorithm that initializes from a single-objective solution and traverses the front by exploring distance-1 neighboring deterministic policies, constructing local convex hulls, and extracting Pareto-optimal faces to enumerate all front vertices. Theoretical results establish a distance-1 property, edge-sufficiency, and locality, justifying an edge-based front traversal that scales better than prior methods that exhaustively explore the preference space. Empirical evaluation demonstrates exact front recovery and improved efficiency over benchmarks and OLS in small MO-MDPs, suggesting practical relevance for fast Pareto-aware decision-making in settings with changing preferences.

Abstract

Multi-Objective Markov Decision Processes (MO-MDPs) are receiving increasing attention, as real-world decision-making problems often involve conflicting objectives that cannot be addressed by a single-objective MDP. The Pareto front identifies the set of policies that cannot be dominated, providing a foundation for finding Pareto optimal solutions that can efficiently adapt to various preferences. However, finding the Pareto front is a highly challenging problem. Most existing methods either (i) rely on traversing the continuous preference space, which is impractical and results in approximations that are difficult to evaluate against the true Pareto front, or (ii) focus solely on deterministic Pareto optimal policies, from which there are no known techniques to characterize the full Pareto front. Moreover, finding the structure of the Pareto front itself remains unclear even in the context of dynamic programming, where the MDP is fully known in advance. In this work, we address the challenge of efficiently discovering the Pareto front. By investigating the geometric structure of the Pareto front in MO-MDPs, we uncover a key property: the Pareto front is on the boundary of a convex polytope whose vertices all correspond to deterministic policies, and neighboring vertices of the Pareto front differ by only one state-action pair of the deterministic policy, almost surely. This insight transforms the global comparison across all policies into a localized search among deterministic policies that differ by only one state-action pair, drastically reducing the complexity of searching for the exact Pareto front. We develop an efficient algorithm that identifies the vertices of the Pareto front by solving a single-objective MDP only once and then traversing the edges of the Pareto front, making it more efficient than existing methods.

How to Find the Exact Pareto Front for Multi-Objective MDPs?

TL;DR

This paper addresses exact Pareto-front discovery in multi-objective MDPs by revealing that the front lies on the boundary of a convex polytope whose vertices are deterministic policies and where neighboring vertices differ by a single state-action pair. It then introduces an efficient algorithm that initializes from a single-objective solution and traverses the front by exploring distance-1 neighboring deterministic policies, constructing local convex hulls, and extracting Pareto-optimal faces to enumerate all front vertices. Theoretical results establish a distance-1 property, edge-sufficiency, and locality, justifying an edge-based front traversal that scales better than prior methods that exhaustively explore the preference space. Empirical evaluation demonstrates exact front recovery and improved efficiency over benchmarks and OLS in small MO-MDPs, suggesting practical relevance for fast Pareto-aware decision-making in settings with changing preferences.

Abstract

Multi-Objective Markov Decision Processes (MO-MDPs) are receiving increasing attention, as real-world decision-making problems often involve conflicting objectives that cannot be addressed by a single-objective MDP. The Pareto front identifies the set of policies that cannot be dominated, providing a foundation for finding Pareto optimal solutions that can efficiently adapt to various preferences. However, finding the Pareto front is a highly challenging problem. Most existing methods either (i) rely on traversing the continuous preference space, which is impractical and results in approximations that are difficult to evaluate against the true Pareto front, or (ii) focus solely on deterministic Pareto optimal policies, from which there are no known techniques to characterize the full Pareto front. Moreover, finding the structure of the Pareto front itself remains unclear even in the context of dynamic programming, where the MDP is fully known in advance. In this work, we address the challenge of efficiently discovering the Pareto front. By investigating the geometric structure of the Pareto front in MO-MDPs, we uncover a key property: the Pareto front is on the boundary of a convex polytope whose vertices all correspond to deterministic policies, and neighboring vertices of the Pareto front differ by only one state-action pair of the deterministic policy, almost surely. This insight transforms the global comparison across all policies into a localized search among deterministic policies that differ by only one state-action pair, drastically reducing the complexity of searching for the exact Pareto front. We develop an efficient algorithm that identifies the vertices of the Pareto front by solving a single-objective MDP only once and then traversing the edges of the Pareto front, making it more efficient than existing methods.

Paper Structure

This paper contains 32 sections, 35 theorems, 76 equations, 8 figures, 1 table, 4 algorithms.

Key Result

Lemma 1

In discounted finite MO-MDP, under assumption:initial_coverage, ${\mathbb{J}}$ is a closed convex polytope, and its vertices are achieved by deterministic policies.

Figures (8)

  • Figure 1: Finding Pareto front and Pareto front vertices in MO-MDP
  • Figure 2: Illustrations of the steps for finding Pareto-optimal front at each iteration. $S=5$, $A=5$, and $D=3$.
  • Figure 3: Convex polytope and its Pareto front (red edge and plane)
  • Figure 4: Pareto front of a simple MDP with $S=4$, $A=3$, and $D=3$.
  • Figure 5: Comparison between the proposed Pareto front searching algorithm and the benchmark algorithm when $D=3$.
  • ...and 3 more figures

Theorems & Definitions (60)

  • Lemma 1
  • Theorem 1
  • Lemma 2
  • Theorem 2
  • Lemma 3
  • Proposition 1
  • Theorem 3
  • Lemma 4
  • Lemma 5
  • proof
  • ...and 50 more