The Max-Min Formulation of Multi-Objective Reinforcement Learning: From Theory to a Model-Free Algorithm
Giseung Park, Woohyeon Byeon, Seongmin Kim, Elad Havakuk, Amir Leshem, Youngchul Sung
TL;DR
This paper addresses fairness in multi-objective reinforcement learning by adopting a max-min criterion over objective returns $J_k(\pi)$, and develops a theory that reformulates the problem via linear programming and convex optimization using state–action visitation frequencies and a weight simplex $\Delta^K$. It introduces an entropy-regularized max-min formulation (P0') to resolve indeterminacy and links the primal policy to a soft-optimal policy through a soft Bellman operator, with a gradient-based, model-free algorithm that alternates soft Q-learning for a given weight and Gaussian-smoothing gradient estimation to update the weights. The approach is shown to be convex in $w$, with P1 and P2 sharing the same optimum, and yields practical, improved max-min performance on tasks including Four-Room, traffic light control, and species conservation, outperforming utilitarian DQN and MDQN baselines. The method has broad implications for fair optimization across multiple objectives in control problems and MARL, enabling explicit balancing of competing goals with scalable, model-free learning.
Abstract
In this paper, we consider multi-objective reinforcement learning, which arises in many real-world problems with multiple optimization goals. We approach the problem with a max-min framework focusing on fairness among the multiple goals and develop a relevant theory and a practical model-free algorithm under the max-min framework. The developed theory provides a theoretical advance in multi-objective reinforcement learning, and the proposed algorithm demonstrates a notable performance improvement over existing baseline methods.
