Intersectional Fairness in Reinforcement Learning with Large State and Constraint Spaces
Eric Eaton, Marcel Hussing, Michael Kearns, Aaron Roth, Sikata Bela Sengupta, Jessica Sorrell
TL;DR
This work addresses intersectional fairness in reinforcement learning with large state and constraint spaces by formulating a multi-objective, state-based group reward model and a minimax objective over groups. It develops oracle-efficient reductions that transform constrained multi-objective RL into standard RL plus a group-constraint optimization, enabling scalability to exponentially many groups. The paper introduces three algorithmic regimes: (i) tabular MDPs with a linear-optimization oracle over $\mathcal{G}$, (ii) large MDPs with separator-set structure using contextual FTPL, and (iii) general group structures via FairFictRL and MORL-BRNR, with proofs of sublinear regret and convergence guarantees in the structured cases. Experiments on a Barabási–Albert graph MDP demonstrate that the proposed methods achieve low constraint violations while maintaining competitive global reward, illustrating practical trade-offs between fairness and efficiency. Overall, the work advances oracle-efficient techniques for ensuring intersectional fairness in RL, with potential impact on real-world decision-making systems where subgroup welfare must be protected across complex, overlapping demographics.
Abstract
In traditional reinforcement learning (RL), the learner aims to solve a single objective optimization problem: find the policy that maximizes expected reward. However, in many real-world settings, it is important to optimize over multiple objectives simultaneously. For example, when we are interested in fairness, states might have feature annotations corresponding to multiple (intersecting) demographic groups to whom reward accrues, and our goal might be to maximize the reward of the group receiving the minimal reward. In this work, we consider a multi-objective optimization problem in which each objective is defined by a state-based reweighting of a single scalar reward function. This generalizes the problem of maximizing the reward of the minimum reward group. We provide oracle-efficient algorithms to solve these multi-objective RL problems even when the number of objectives is exponentially large-for tabular MDPs, as well as for large MDPs when the group functions have additional structure. Finally, we experimentally validate our theoretical results and demonstrate applications on a preferential attachment graph MDP.
