Table of Contents
Fetching ...

Reward Dimension Reduction for Scalable Multi-Objective Reinforcement Learning

Giseung Park, Youngchul Sung

TL;DR

This work tackles the scalability challenge of multi-objective reinforcement learning by introducing online reward dimension reduction that maps high-dimensional reward vectors $r\in\mathbb{R}^K$ to a lower-dimensional space $f(r)\in\mathbb{R}^m$ while preserving Pareto-optimality. The authors propose a concrete framework and an affine, positive, row-stochastic transformation $f(r)=Ar+b$ (with a reconstruction network $g_\phi$ and a softmax parameterization of $A$) that enables online training of a reduced multi-policy MORL via $Q_m^*(s,a,\omega_m)$ for all $\omega_m\in\Delta^m$. They evaluate on LunarLander-5D and a 16-objective traffic-light control task, showing substantial improvements in hypervolume and sparsity over baselines such as online AE, online PCA, and NPCA, and they provide ablations to justify the design constraints. The results indicate that reward dimension reduction can effectively scale MORL to high-dimensional objective spaces, offering practical implications for complex, real-world control problems. The work also includes a reproducibility statement and public code, supporting further exploration and extension in high-dimensional MORL contexts.

Abstract

In this paper, we introduce a simple yet effective reward dimension reduction method to tackle the scalability challenges of multi-objective reinforcement learning algorithms. While most existing approaches focus on optimizing two to four objectives, their abilities to scale to environments with more objectives remain uncertain. Our method uses a dimension reduction approach to enhance learning efficiency and policy performance in multi-objective settings. While most traditional dimension reduction methods are designed for static datasets, our approach is tailored for online learning and preserves Pareto-optimality after transformation. We propose a new training and evaluation framework for reward dimension reduction in multi-objective reinforcement learning and demonstrate the superiority of our method in environments including one with sixteen objectives, significantly outperforming existing online dimension reduction methods.

Reward Dimension Reduction for Scalable Multi-Objective Reinforcement Learning

TL;DR

This work tackles the scalability challenge of multi-objective reinforcement learning by introducing online reward dimension reduction that maps high-dimensional reward vectors to a lower-dimensional space while preserving Pareto-optimality. The authors propose a concrete framework and an affine, positive, row-stochastic transformation (with a reconstruction network and a softmax parameterization of ) that enables online training of a reduced multi-policy MORL via for all . They evaluate on LunarLander-5D and a 16-objective traffic-light control task, showing substantial improvements in hypervolume and sparsity over baselines such as online AE, online PCA, and NPCA, and they provide ablations to justify the design constraints. The results indicate that reward dimension reduction can effectively scale MORL to high-dimensional objective spaces, offering practical implications for complex, real-world control problems. The work also includes a reproducibility statement and public code, supporting further exploration and extension in high-dimensional MORL contexts.

Abstract

In this paper, we introduce a simple yet effective reward dimension reduction method to tackle the scalability challenges of multi-objective reinforcement learning algorithms. While most existing approaches focus on optimizing two to four objectives, their abilities to scale to environments with more objectives remain uncertain. Our method uses a dimension reduction approach to enhance learning efficiency and policy performance in multi-objective settings. While most traditional dimension reduction methods are designed for static datasets, our approach is tailored for online learning and preserves Pareto-optimality after transformation. We propose a new training and evaluation framework for reward dimension reduction in multi-objective reinforcement learning and demonstrate the superiority of our method in environments including one with sixteen objectives, significantly outperforming existing online dimension reduction methods.

Paper Structure

This paper contains 21 sections, 1 theorem, 13 equations, 5 figures, 11 tables.

Key Result

Theorem 1

If $f$ is affine and each element of the matrix is positive, then equation eq:pareto_preserve is satisfied.

Figures (5)

  • Figure 1: Comparison of the Pareto frontier $\mathcal{F}$ and the CCS for $K=2$, where $C'$ and $D'$ represent the projections of points $C$ and $D$ onto the preference vector $\omega$, respectively. Yellow dashed line represents the outer convex boundary of $\mathcal{F}$.
  • Figure 2: Evaluation metrics in multi-policy MORL: hypervolume and sparsity. (Left) Hypervolume is represented by the pink area in the figure. (Right) The sparsity of the solution set $\{A,B,D\}$ is lower than that of $\{A,C,D\}$ when points $C$ and $D$ are close, indicating that $\{A,B,D\}$ offers a more diverse set of solutions than $\{A,C,D\}$.
  • Figure 3: Our proposed reward dimension reduction framework. We design a mapping function $f: \mathbb{R}^K \rightarrow \mathbb{R}^m$ from the original reward space to the reduced reward space.
  • Figure 4: A snapshot of our considered environment: traffic light control.
  • Figure 5: t-SNE visualization of the acquired Pareto frontier points

Theorems & Definitions (2)

  • Theorem 1
  • proof