Demonstration Guided Multi-Objective Reinforcement Learning

Junlin Lu; Patrick Mannion; Karl Mason

Demonstration Guided Multi-Objective Reinforcement Learning

Junlin Lu, Patrick Mannion, Karl Mason

TL;DR

This work introduces demonstration-guided multi-objective reinforcement learning (DG-MORL), a novel approach that utilizes prior demonstrations, aligns them with user preferences via corner weight support, and incorporates a self-evolving mechanism to refine suboptimal demonstrations.

Abstract

Multi-objective reinforcement learning (MORL) is increasingly relevant due to its resemblance to real-world scenarios requiring trade-offs between multiple objectives. Catering to diverse user preferences, traditional reinforcement learning faces amplified challenges in MORL. To address the difficulty of training policies from scratch in MORL, we introduce demonstration-guided multi-objective reinforcement learning (DG-MORL). This novel approach utilizes prior demonstrations, aligns them with user preferences via corner weight support, and incorporates a self-evolving mechanism to refine suboptimal demonstrations. Our empirical studies demonstrate DG-MORL's superiority over existing MORL algorithms, establishing its robustness and efficacy, particularly under challenging conditions. We also provide an upper bound of the algorithm's sample complexity.

Demonstration Guided Multi-Objective Reinforcement Learning

TL;DR

Abstract

Paper Structure (37 sections, 6 theorems, 16 equations, 20 figures, 7 tables, 1 algorithm)

This paper contains 37 sections, 6 theorems, 16 equations, 20 figures, 7 tables, 1 algorithm.

Introduction
Related Work
Multi-Objective Reinforcement Learning
Prior demonstration Utilization Approaches
Effective Exploration Approaches
Preliminaries
Demonstration-Guided Multi-Objective Reinforcement Learning
Demonstrations Corner Weight Computation
Self-Evolving Mechanism
Multi-Stage Curriculum
Algorithm
Theoretical Analysis
Experiments
Benchmark Environments
Baseline Algorithms
...and 22 more sections

Key Result

Theorem 4.2

(Theorem 7 of Roijers roijers2016multi) There is a corner weight $\bm{w}$ that maximizes:

Figures (20)

Figure 1: (a) Training process of traditional MORL; (b) Training process of demonstration-guided MORL
Figure 2: (a) Demonstration-guided MORL without self-evolving; (b) Demonstration-guided MORL with self-evolving
Figure 3: DST Expected Utility Result
Figure 4: Minecart Expected Utility Result
Figure 5: MO-Hopper Expected Utility Result
...and 15 more figures

Theorems & Definitions (9)

Definition 4.1
Theorem 4.2
Theorem 4.3
Theorem 4.4
Theorem 4.1
Corollary 4.3
Definition 4.4
Definition 4.5
Theorem 4.6

Demonstration Guided Multi-Objective Reinforcement Learning

TL;DR

Abstract

Demonstration Guided Multi-Objective Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (20)

Theorems & Definitions (9)