Thresholded Lexicographic Ordered Multiobjective Reinforcement Learning

Alperen Tercan; Vinayak S. Prabhu

Thresholded Lexicographic Ordered Multiobjective Reinforcement Learning

Alperen Tercan, Vinayak S. Prabhu

TL;DR

This work investigates further shortcomings, proposes fixes for improving practical performance in many cases, and presents a policy optimization approach using the authors' Lexicographic Projection Optimization (LPO) algorithm that has the potential to address theoretical and practical concerns.

Abstract

Lexicographic multi-objective problems, which impose a lexicographic importance order over the objectives, arise in many real-life scenarios. Existing Reinforcement Learning work directly addressing lexicographic tasks has been scarce. The few proposed approaches were all noted to be heuristics without theoretical guarantees as the Bellman equation is not applicable to them. Additionally, the practical applicability of these prior approaches also suffers from various issues such as not being able to reach the goal state. While some of these issues have been known before, in this work we investigate further shortcomings, and propose fixes for improving practical performance in many cases. We also present a policy optimization approach using our Lexicographic Projection Optimization (LPO) algorithm that has the potential to address these theoretical and practical concerns. Finally, we demonstrate our proposed algorithms on benchmark problems.

Thresholded Lexicographic Ordered Multiobjective Reinforcement Learning

TL;DR

Abstract

Paper Structure (47 sections, 6 theorems, 53 equations, 16 figures, 5 algorithms)

This paper contains 47 sections, 6 theorems, 53 equations, 16 figures, 5 algorithms.

Related Work
Background
TLQ: Value Function Based Approaches for TLO
Shortcomings of Prior TLQ Approaches
Policy Gradient Approach for TLO
Experiments
Discussions and Comparison with Other Methods
Conclusion
Technical Appendix Organization
Further Details on Acceptable Policies
Issues with TLQ
Failing to Reach the Goal
Failure to Sacrifice Early and Late
Variations to TLQ and Some Alternatives
Failed Attempts
...and 32 more sections

Key Result

Proposition 3.1

In addition to previously identified failure case in vamplew2011empirical, TLQ does not work when the constrained objective is a terminating endpoint objective but the unconstrained one is non-terminating.

Figures (16)

Figure 1: A simple maze that demonstrates how TLQ fails to reach the goal state.
Figure 2: The changes in the function values. Notice that $F_2$, in orange, is ignored until the threshold for $F_1$ is reached. Then, $F_2$ is optimized while respecting the passed threshold of $F_1$.
Figure 3: The maze
Figure 4: Satisfaction rates for a single successful seed for the path objective maze experiment over 100 episodes.
Figure 5: Satisfaction rates for a single successful seed for the endpoint maze experiment over 100 episodes.
...and 11 more figures

Theorems & Definitions (9)

Proposition 3.1
Proposition 4.1
Remark 4.1
Lemma E.1
Proof E.1
Lemma G.1
Lemma G.2
Proof G.1
Theorem G.3

Thresholded Lexicographic Ordered Multiobjective Reinforcement Learning

TL;DR

Abstract

Thresholded Lexicographic Ordered Multiobjective Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (9)