Group Fairness in Multi-Task Reinforcement Learning
Kefan Song, Runnan Jiang, Rohan Chandra, Shangtong Zhang
TL;DR
The paper addresses the problem of enforcing demographic parity across multiple RL tasks (multi-task group fairness) by formulating finite-horizon and infinite-horizon constrained problems. It introduces a conservative policy selection mechanism using optimistic/pessimistic rewards and confidence bounds to guarantee zero fairness violations with high probability, while also achieving sublinear regret. Theoretical guarantees include a high-probability zero-violation bound and a sublinear regret bound, complemented by empirical results on RiverSwim and MuJoCo showing smaller fairness gaps across tasks with comparable returns. This work enables fair and scalable multi-task RL applicable to real-world systems like recommender systems and RLHF settings, paving the way for socially responsible sequential decision making.
Abstract
This paper addresses a critical societal consideration in the application of Reinforcement Learning (RL): ensuring equitable outcomes across different demographic groups in multi-task settings. While previous work has explored fairness in single-task RL, many real-world applications are multi-task in nature and require policies to maintain fairness across all tasks. We introduce a novel formulation of multi-task group fairness in RL and propose a constrained optimization algorithm that explicitly enforces fairness constraints across multiple tasks simultaneously. We have shown that our proposed algorithm does not violate fairness constraints with high probability and with sublinear regret in the finite-horizon episodic setting. Through experiments in RiverSwim and MuJoCo environments, we demonstrate that our approach better ensures group fairness across multiple tasks compared to previous methods that lack explicit multi-task fairness constraints in both the finite-horizon setting and the infinite-horizon setting. Our results show that the proposed algorithm achieves smaller fairness gaps while maintaining comparable returns across different demographic groups and tasks, suggesting its potential for addressing fairness concerns in real-world multi-task RL applications.
