Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning
Dohyeong Kim, Mineui Hong, Jeongho Park, Songhwai Oh
TL;DR
Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning (CoMOGA) reframes CMORL as a constrained optimization problem to prevent gradient conflicts among multiple objectives and enforce safety constraints. It computes a gradient update by solving a linearized quadratic program that aggregates objective and constraint gradients, then uses a KL-based loss to train a universal policy that covers the preferred objective spectrum. The approach achieves convergence to CP-optimal policies in tabular settings and demonstrates superior hypervolume and sparsity metrics across legged locomotion, Safety Gymnasium, and MO-Gymnasium tasks, while maintaining constraint satisfaction without introducing extra optimization variables. These results suggest CoMOGA provides stable, scalable, and constraint-satisfying performance improvements for safety-aware, multi-objective RL, with potential for broader convergence guarantees under the proposed generalized policy update framework.
Abstract
In many real-world applications, a reinforcement learning (RL) agent should consider multiple objectives and adhere to safety guidelines. To address these considerations, we propose a constrained multi-objective RL algorithm named Constrained Multi-Objective Gradient Aggregator (CoMOGA). In the field of multi-objective optimization, managing conflicts between the gradients of the multiple objectives is crucial to prevent policies from converging to local optima. It is also essential to efficiently handle safety constraints for stable training and constraint satisfaction. We address these challenges straightforwardly by treating the maximization of multiple objectives as a constrained optimization problem (COP), where the constraints are defined to improve the original objectives. Existing safety constraints are then integrated into the COP, and the policy is updated using a linear approximation, which ensures the avoidance of gradient conflicts. Despite its simplicity, CoMOGA guarantees optimal convergence in tabular settings. Through various experiments, we have confirmed that preventing gradient conflicts is critical, and the proposed method achieves constraint satisfaction across all tasks.
