Multi-Objective Reinforcement Learning Based on Decomposition: A Taxonomy and Framework

Florian Felten; El-Ghazali Talbi; Grégoire Danoy

Multi-Objective Reinforcement Learning Based on Decomposition: A Taxonomy and Framework

Florian Felten, El-Ghazali Talbi, Grégoire Danoy

TL;DR

This work introduces MORL/D, a taxonomy and framework bridging reinforcement learning and decomposition-based multi-objective optimization. It formalizes a Pareto-set of policies for MOMDPs by applying scalarization and decomposition techniques to learn multiple objectives, leveraging both RL and MOO/D insights. The taxonomy categorizes existing MORL literature along dimensions such as scalarization, cooperation, and regression structure, while the MORL/D framework provides flexible instantiations (e.g., shared representations, weight adaptation, and buffer strategies) demonstrated on MO-Gymnasium benchmarks. Experiments on mo-halfcheetah-v4 and deep-sea-treasure-concave-v0 show MORL/D variants can match or exceed state-of-the-art performance, including the ability to learn points in concave regions of the PF, and the authors discuss future directions such as non-linear scalarization and automated MORL/D design for broader applicability.

Abstract

Multi-objective reinforcement learning (MORL) extends traditional RL by seeking policies making different compromises among conflicting objectives. The recent surge of interest in MORL has led to diverse studies and solving methods, often drawing from existing knowledge in multi-objective optimization based on decomposition (MOO/D). Yet, a clear categorization based on both RL and MOO/D is lacking in the existing literature. Consequently, MORL researchers face difficulties when trying to classify contributions within a broader context due to the absence of a standardized taxonomy. To tackle such an issue, this paper introduces multi-objective reinforcement learning based on decomposition (MORL/D), a novel methodology bridging the literature of RL and MOO. A comprehensive taxonomy for MORL/D is presented, providing a structured foundation for categorizing existing and potential MORL works. The introduced taxonomy is then used to scrutinize MORL research, enhancing clarity and conciseness through well-defined categorization. Moreover, a flexible framework derived from the taxonomy is introduced. This framework accommodates diverse instantiations using tools from both RL and MOO/D. Its versatility is demonstrated by implementing it in different configurations and assessing it on contrasting benchmark problems. Results indicate MORL/D instantiations achieve comparable performance to current state-of-the-art approaches on the studied problems. By presenting the taxonomy and framework, this paper offers a comprehensive perspective and a unified vocabulary for MORL. This not only facilitates the identification of algorithmic contributions but also lays the groundwork for novel research avenues in MORL.

Multi-Objective Reinforcement Learning Based on Decomposition: A Taxonomy and Framework

TL;DR

Abstract

Paper Structure (129 sections, 12 equations, 16 figures, 2 tables, 3 algorithms)

This paper contains 129 sections, 12 equations, 16 figures, 2 tables, 3 algorithms.

Introduction
Reinforcement Learning
Regression Structure
Policy Evaluation and Improvement
Buffer Strategy
Replacement.
Selection.
Sampling Strategy
Summary
Multi-Objective Optimization Based on Decomposition
Solution concepts.
Approximated optimization methods.
Single solution and population-based methods.
Decomposition.
Scalarization Functions
...and 114 more sections

Figures (16)

Figure 1: Reinforcement learning: design choices
Figure 2: Illustration of Pareto front approximations. In the left part, the convergence aspect of the approximated front is represented by the arrows. In the right part, the diversity aspect is represented by the arrows.
Figure 3: The decomposition in the objective space idea: split the multi-objective problem into various single-objective problems $sp_n$ by relying on a scalarization function (weighted sum in this case). $sp_1$, $sp_2$, and $sp_3$ are considered to be neighbors since their associated weight vectors are close to each other while $sp_4$ is not considered to be in the neighborhood.
Figure 4: Design choices of multi-objective optimization based on decomposition (MOO/D).
Figure 5: The decomposition idea applied to MORL. Blue parts emphasize the parts coming from RL, while black parts come from MOO. The optimization is looking for the best parameters for the regression structure to generate good policies. The idea of neighbor policies is that policies that have similar parameters should lead to close evaluations.
...and 11 more figures

Multi-Objective Reinforcement Learning Based on Decomposition: A Taxonomy and Framework

TL;DR

Abstract

Multi-Objective Reinforcement Learning Based on Decomposition: A Taxonomy and Framework

Authors

TL;DR

Abstract

Table of Contents

Figures (16)