Table of Contents
Fetching ...

Thresholded Lexicographic Ordered Multiobjective Reinforcement Learning

Alperen Tercan, Vinayak S. Prabhu

TL;DR

This work investigates further shortcomings, proposes fixes for improving practical performance in many cases, and presents a policy optimization approach using the authors' Lexicographic Projection Optimization (LPO) algorithm that has the potential to address theoretical and practical concerns.

Abstract

Lexicographic multi-objective problems, which impose a lexicographic importance order over the objectives, arise in many real-life scenarios. Existing Reinforcement Learning work directly addressing lexicographic tasks has been scarce. The few proposed approaches were all noted to be heuristics without theoretical guarantees as the Bellman equation is not applicable to them. Additionally, the practical applicability of these prior approaches also suffers from various issues such as not being able to reach the goal state. While some of these issues have been known before, in this work we investigate further shortcomings, and propose fixes for improving practical performance in many cases. We also present a policy optimization approach using our Lexicographic Projection Optimization (LPO) algorithm that has the potential to address these theoretical and practical concerns. Finally, we demonstrate our proposed algorithms on benchmark problems.

Thresholded Lexicographic Ordered Multiobjective Reinforcement Learning

TL;DR

This work investigates further shortcomings, proposes fixes for improving practical performance in many cases, and presents a policy optimization approach using the authors' Lexicographic Projection Optimization (LPO) algorithm that has the potential to address theoretical and practical concerns.

Abstract

Lexicographic multi-objective problems, which impose a lexicographic importance order over the objectives, arise in many real-life scenarios. Existing Reinforcement Learning work directly addressing lexicographic tasks has been scarce. The few proposed approaches were all noted to be heuristics without theoretical guarantees as the Bellman equation is not applicable to them. Additionally, the practical applicability of these prior approaches also suffers from various issues such as not being able to reach the goal state. While some of these issues have been known before, in this work we investigate further shortcomings, and propose fixes for improving practical performance in many cases. We also present a policy optimization approach using our Lexicographic Projection Optimization (LPO) algorithm that has the potential to address these theoretical and practical concerns. Finally, we demonstrate our proposed algorithms on benchmark problems.
Paper Structure (47 sections, 6 theorems, 53 equations, 16 figures, 5 algorithms)

This paper contains 47 sections, 6 theorems, 53 equations, 16 figures, 5 algorithms.

Key Result

Proposition 3.1

In addition to previously identified failure case in vamplew2011empirical, TLQ does not work when the constrained objective is a terminating endpoint objective but the unconstrained one is non-terminating.

Figures (16)

  • Figure 1: A simple maze that demonstrates how TLQ fails to reach the goal state.
  • Figure 2: The changes in the function values. Notice that $F_2$, in orange, is ignored until the threshold for $F_1$ is reached. Then, $F_2$ is optimized while respecting the passed threshold of $F_1$.
  • Figure 3: The maze
  • Figure 4: Satisfaction rates for a single successful seed for the path objective maze experiment over 100 episodes.
  • Figure 5: Satisfaction rates for a single successful seed for the endpoint maze experiment over 100 episodes.
  • ...and 11 more figures

Theorems & Definitions (9)

  • Proposition 3.1
  • Proposition 4.1
  • Remark 4.1
  • Lemma E.1
  • Proof E.1
  • Lemma G.1
  • Lemma G.2
  • Proof G.1
  • Theorem G.3