Table of Contents
Fetching ...

Demonstration Guided Multi-Objective Reinforcement Learning

Junlin Lu, Patrick Mannion, Karl Mason

TL;DR

This work introduces demonstration-guided multi-objective reinforcement learning (DG-MORL), a novel approach that utilizes prior demonstrations, aligns them with user preferences via corner weight support, and incorporates a self-evolving mechanism to refine suboptimal demonstrations.

Abstract

Multi-objective reinforcement learning (MORL) is increasingly relevant due to its resemblance to real-world scenarios requiring trade-offs between multiple objectives. Catering to diverse user preferences, traditional reinforcement learning faces amplified challenges in MORL. To address the difficulty of training policies from scratch in MORL, we introduce demonstration-guided multi-objective reinforcement learning (DG-MORL). This novel approach utilizes prior demonstrations, aligns them with user preferences via corner weight support, and incorporates a self-evolving mechanism to refine suboptimal demonstrations. Our empirical studies demonstrate DG-MORL's superiority over existing MORL algorithms, establishing its robustness and efficacy, particularly under challenging conditions. We also provide an upper bound of the algorithm's sample complexity.

Demonstration Guided Multi-Objective Reinforcement Learning

TL;DR

This work introduces demonstration-guided multi-objective reinforcement learning (DG-MORL), a novel approach that utilizes prior demonstrations, aligns them with user preferences via corner weight support, and incorporates a self-evolving mechanism to refine suboptimal demonstrations.

Abstract

Multi-objective reinforcement learning (MORL) is increasingly relevant due to its resemblance to real-world scenarios requiring trade-offs between multiple objectives. Catering to diverse user preferences, traditional reinforcement learning faces amplified challenges in MORL. To address the difficulty of training policies from scratch in MORL, we introduce demonstration-guided multi-objective reinforcement learning (DG-MORL). This novel approach utilizes prior demonstrations, aligns them with user preferences via corner weight support, and incorporates a self-evolving mechanism to refine suboptimal demonstrations. Our empirical studies demonstrate DG-MORL's superiority over existing MORL algorithms, establishing its robustness and efficacy, particularly under challenging conditions. We also provide an upper bound of the algorithm's sample complexity.
Paper Structure (37 sections, 6 theorems, 16 equations, 20 figures, 7 tables, 1 algorithm)

This paper contains 37 sections, 6 theorems, 16 equations, 20 figures, 7 tables, 1 algorithm.

Key Result

Theorem 4.2

(Theorem 7 of Roijers roijers2016multi) There is a corner weight $\bm{w}$ that maximizes:

Figures (20)

  • Figure 1: (a) Training process of traditional MORL; (b) Training process of demonstration-guided MORL
  • Figure 2: (a) Demonstration-guided MORL without self-evolving; (b) Demonstration-guided MORL with self-evolving
  • Figure 3: DST Expected Utility Result
  • Figure 4: Minecart Expected Utility Result
  • Figure 5: MO-Hopper Expected Utility Result
  • ...and 15 more figures

Theorems & Definitions (9)

  • Definition 4.1
  • Theorem 4.2
  • Theorem 4.3
  • Theorem 4.4
  • Theorem 4.1
  • Corollary 4.3
  • Definition 4.4
  • Definition 4.5
  • Theorem 4.6