Table of Contents
Fetching ...

A Practical Guide to Multi-Objective Reinforcement Learning and Planning

Conor F. Hayes, Roxana Rădulescu, Eugenio Bargiacchi, Johan Källström, Matthew Macfarlane, Mathieu Reymond, Timothy Verstraeten, Luisa M. Zintgraf, Richard Dazeley, Fredrik Heintz, Enda Howley, Athirai A. Irissappane, Patrick Mannion, Ann Nowé, Gabriel Ramos, Marcello Restelli, Peter Vamplew, Diederik M. Roijers

TL;DR

Real-world decision problems are typically multi-objective, making scalar reward design insufficient. The paper advocates a utility-based MORL framework, formalizes MOMDPs, and clarifies solution concepts (PF, CH, CCS, PCS) under ESR and SER criteria. It surveys planning and RL algorithms, discusses evaluation metrics (hypervolume, epsilon, EUM, MUL), and demonstrates a water-reservoir example with MONES to illustrate practical benefits. It also highlights challenges such as benchmarks, many-objective settings, and dynamic objective identification to guide future MORL research and deployment.

Abstract

Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems.

A Practical Guide to Multi-Objective Reinforcement Learning and Planning

TL;DR

Real-world decision problems are typically multi-objective, making scalar reward design insufficient. The paper advocates a utility-based MORL framework, formalizes MOMDPs, and clarifies solution concepts (PF, CH, CCS, PCS) under ESR and SER criteria. It surveys planning and RL algorithms, discusses evaluation metrics (hypervolume, epsilon, EUM, MUL), and demonstrates a water-reservoir example with MONES to illustrate practical benefits. It also highlights challenges such as benchmarks, many-objective settings, and dynamic objective identification to guide future MORL research and deployment.

Abstract

Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems.

Paper Structure

This paper contains 54 sections, 24 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Work flow diagram for multi-objective reinforcement learning and planning.
  • Figure 2: The six motivating scenarios for MOMDPs: (a) the unknown utility function scenario, (b) the decision support scenario, (c) the known utility function scenario, (d) the interactive decision support scenario, (e) the dynamic utility function scenario, and (f) the review and adjust scenario.
  • Figure 3: Multi-objective multi-agent decision making taxonomy and mapping of solution concepts radulescu2020survey.
  • Figure 4: Left: A graphical illustration of the hypervolume for a 2-objective problem, where both objectives are to be maximised. Solutions in red form the undominated set, while solutions in black are said to be dominated. The shaded area denotes the hypervolume of the undominated set with respect to the reference point (shown in blue). Right: The effect of adding two new points (shown in green) to the undominated set.
  • Figure 5: Comparison of returns (left) with non-dominated returns (right). In order to enhance presentation, the right plot's horizontal axis was clipped to a smaller interval.
  • ...and 2 more figures

Theorems & Definitions (6)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6