Safe and Balanced: A Framework for Constrained Multi-Objective Reinforcement Learning
Shangding Gu, Bilgehan Sel, Yuhao Ding, Lu Wang, Qingwei Lin, Alois Knoll, Ming Jin
TL;DR
This work tackles safe, multi-objective reinforcement learning by proposing a primal-based framework (CR-MOPO) that simultaneously optimizes multiple task objectives while enforcing hard safety constraints. The core innovation is a Conflict-Averse Natural Policy Gradient (CA-NPG) that mitigates gradient conflicts among objectives, coupled with a constraint-rectification mechanism to enforce safety when needed. The authors provide theoretical convergence and constraint-violation guarantees in the tabular setting and demonstrate superior performance over state-of-the-art baselines CRPO and LP3 on the Safe Multi-Objective MuJoCo benchmark. The approach yields monotonic improvements in objective rewards while maintaining safety, offering a practical pathway toward robust, safe MORL in complex control tasks.
Abstract
In numerous reinforcement learning (RL) problems involving safety-critical systems, a key challenge lies in balancing multiple objectives while simultaneously meeting all stringent safety constraints. To tackle this issue, we propose a primal-based framework that orchestrates policy optimization between multi-objective learning and constraint adherence. Our method employs a novel natural policy gradient manipulation method to optimize multiple RL objectives and overcome conflicting gradients between different tasks, since the simple weighted average gradient direction may not be beneficial for specific tasks' performance due to misaligned gradients of different task objectives. When there is a violation of a hard constraint, our algorithm steps in to rectify the policy to minimize this violation. We establish theoretical convergence and constraint violation guarantees in a tabular setting. Empirically, our proposed method also outperforms prior state-of-the-art methods on challenging safe multi-objective reinforcement learning tasks.
