Table of Contents
Fetching ...

Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning

Dohyeong Kim, Mineui Hong, Jeongho Park, Songhwai Oh

TL;DR

Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning (CoMOGA) reframes CMORL as a constrained optimization problem to prevent gradient conflicts among multiple objectives and enforce safety constraints. It computes a gradient update by solving a linearized quadratic program that aggregates objective and constraint gradients, then uses a KL-based loss to train a universal policy that covers the preferred objective spectrum. The approach achieves convergence to CP-optimal policies in tabular settings and demonstrates superior hypervolume and sparsity metrics across legged locomotion, Safety Gymnasium, and MO-Gymnasium tasks, while maintaining constraint satisfaction without introducing extra optimization variables. These results suggest CoMOGA provides stable, scalable, and constraint-satisfying performance improvements for safety-aware, multi-objective RL, with potential for broader convergence guarantees under the proposed generalized policy update framework.

Abstract

In many real-world applications, a reinforcement learning (RL) agent should consider multiple objectives and adhere to safety guidelines. To address these considerations, we propose a constrained multi-objective RL algorithm named Constrained Multi-Objective Gradient Aggregator (CoMOGA). In the field of multi-objective optimization, managing conflicts between the gradients of the multiple objectives is crucial to prevent policies from converging to local optima. It is also essential to efficiently handle safety constraints for stable training and constraint satisfaction. We address these challenges straightforwardly by treating the maximization of multiple objectives as a constrained optimization problem (COP), where the constraints are defined to improve the original objectives. Existing safety constraints are then integrated into the COP, and the policy is updated using a linear approximation, which ensures the avoidance of gradient conflicts. Despite its simplicity, CoMOGA guarantees optimal convergence in tabular settings. Through various experiments, we have confirmed that preventing gradient conflicts is critical, and the proposed method achieves constraint satisfaction across all tasks.

Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning

TL;DR

Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning (CoMOGA) reframes CMORL as a constrained optimization problem to prevent gradient conflicts among multiple objectives and enforce safety constraints. It computes a gradient update by solving a linearized quadratic program that aggregates objective and constraint gradients, then uses a KL-based loss to train a universal policy that covers the preferred objective spectrum. The approach achieves convergence to CP-optimal policies in tabular settings and demonstrates superior hypervolume and sparsity metrics across legged locomotion, Safety Gymnasium, and MO-Gymnasium tasks, while maintaining constraint satisfaction without introducing extra optimization variables. These results suggest CoMOGA provides stable, scalable, and constraint-satisfying performance improvements for safety-aware, multi-objective RL, with potential for broader convergence guarantees under the proposed generalized policy update framework.

Abstract

In many real-world applications, a reinforcement learning (RL) agent should consider multiple objectives and adhere to safety guidelines. To address these considerations, we propose a constrained multi-objective RL algorithm named Constrained Multi-Objective Gradient Aggregator (CoMOGA). In the field of multi-objective optimization, managing conflicts between the gradients of the multiple objectives is crucial to prevent policies from converging to local optima. It is also essential to efficiently handle safety constraints for stable training and constraint satisfaction. We address these challenges straightforwardly by treating the maximization of multiple objectives as a constrained optimization problem (COP), where the constraints are defined to improve the original objectives. Existing safety constraints are then integrated into the COP, and the policy is updated using a linear approximation, which ensures the avoidance of gradient conflicts. Despite its simplicity, CoMOGA guarantees optimal convergence in tabular settings. Through various experiments, we have confirmed that preventing gradient conflicts is critical, and the proposed method achieves constraint satisfaction across all tasks.
Paper Structure (32 sections, 5 theorems, 58 equations, 14 figures, 5 tables, 1 algorithm)

This paper contains 32 sections, 5 theorems, 58 equations, 14 figures, 5 tables, 1 algorithm.

Key Result

Theorem 4.2

Assume that sequences $\nu^a_{t,i}$, $\nu^b_{t,i}$, $\lambda^a_{t,k}$, $\lambda^b_{t,k} \in [0, \lambda_\mathrm{max}]$ are given for all $i$ and $k$, where $\sum_k\lambda^b_{t,k} = 1$, $\lambda^b_{t,k}(J_{C_k}(\theta_t) - d_k) \geq 0$, $\sum_i\nu^a_{t,i}=1$, and $\nu^a_{t,i}$ converges to a specific where $g_i = \nabla J_{R_i}(\theta_t)$, $b_k = \nabla J_{C_k}(\theta_t)$, $\alpha_t$ is a step size

Figures (14)

  • Figure 1: Example of CMORL. The robot aims to maximize energy efficiency and velocity while maintaining its balance to avoid falling. In order to consider such safety and multiple objectives, CMORL finds a set of feasible policies that are not dominated by other policies and satisfy constraints, which are indicated by the dashed line.
  • Figure 2: Comparison between LS and the proposed method. Optimization trajectories are indicated in red, with initial points marked by black circles. The contours illustrate the average values of the two objective functions, while the shaded area indicates regions where the constraints are violated. The grey line represents the CP optimal set. For some initial points, LS fails to reach the optimal set, whereas CoMOGA consistently finds it from any starting position. For more details, please see Appendix \ref{['sec: toy example']}.
  • Figure 3: Process of CoMOGA. We visualize the process in the parameter space, and the gray areas represent constraints. (Linear approximation) CoMOGA linearly approximates the original CMORL problem in (\ref{['eq: cmorl problem']}). The gradients of the objective and constraint functions are visualized as black and blue arrows, respectively. (Transformation) The objectives are converted to constraints as described in (\ref{['eq: gradient aggregation']}). The intersection of all constraints is shown as the red area. (Scaling) The solution of the transformed problem, $\bar{g}^\mathrm{ag}$, is scaled to ensure that the updated policy remains within the local region.
  • Figure 4: Evaluation results of the legged robot locomotion tasks. The upper row shows results for the bipedal robot, while the lower row is for the quadrupedal robot. All algorithms are evaluated at every $10^5$ steps. The bold lines and shaded areas represent the mean and quarter-scaled standard deviation of results from five random seeds, respectively. The black dotted lines in constraint graphs indicate the thresholds.
  • Figure 5: Evaluation results of the Safety Gymnasium tasks.
  • ...and 9 more figures

Theorems & Definitions (10)

  • Definition 3.1: Constrained Dominance
  • Definition 3.2: Gradient Conflict
  • Theorem 4.2
  • Theorem 4.3
  • Lemma B.1
  • proof
  • Theorem B.1
  • proof
  • Theorem B.1
  • proof