Table of Contents
Fetching ...

Group Fairness in Multi-Task Reinforcement Learning

Kefan Song, Runnan Jiang, Rohan Chandra, Shangtong Zhang

TL;DR

The paper addresses the problem of enforcing demographic parity across multiple RL tasks (multi-task group fairness) by formulating finite-horizon and infinite-horizon constrained problems. It introduces a conservative policy selection mechanism using optimistic/pessimistic rewards and confidence bounds to guarantee zero fairness violations with high probability, while also achieving sublinear regret. Theoretical guarantees include a high-probability zero-violation bound and a sublinear regret bound, complemented by empirical results on RiverSwim and MuJoCo showing smaller fairness gaps across tasks with comparable returns. This work enables fair and scalable multi-task RL applicable to real-world systems like recommender systems and RLHF settings, paving the way for socially responsible sequential decision making.

Abstract

This paper addresses a critical societal consideration in the application of Reinforcement Learning (RL): ensuring equitable outcomes across different demographic groups in multi-task settings. While previous work has explored fairness in single-task RL, many real-world applications are multi-task in nature and require policies to maintain fairness across all tasks. We introduce a novel formulation of multi-task group fairness in RL and propose a constrained optimization algorithm that explicitly enforces fairness constraints across multiple tasks simultaneously. We have shown that our proposed algorithm does not violate fairness constraints with high probability and with sublinear regret in the finite-horizon episodic setting. Through experiments in RiverSwim and MuJoCo environments, we demonstrate that our approach better ensures group fairness across multiple tasks compared to previous methods that lack explicit multi-task fairness constraints in both the finite-horizon setting and the infinite-horizon setting. Our results show that the proposed algorithm achieves smaller fairness gaps while maintaining comparable returns across different demographic groups and tasks, suggesting its potential for addressing fairness concerns in real-world multi-task RL applications.

Group Fairness in Multi-Task Reinforcement Learning

TL;DR

The paper addresses the problem of enforcing demographic parity across multiple RL tasks (multi-task group fairness) by formulating finite-horizon and infinite-horizon constrained problems. It introduces a conservative policy selection mechanism using optimistic/pessimistic rewards and confidence bounds to guarantee zero fairness violations with high probability, while also achieving sublinear regret. Theoretical guarantees include a high-probability zero-violation bound and a sublinear regret bound, complemented by empirical results on RiverSwim and MuJoCo showing smaller fairness gaps across tasks with comparable returns. This work enables fair and scalable multi-task RL applicable to real-world systems like recommender systems and RLHF settings, paving the way for socially responsible sequential decision making.

Abstract

This paper addresses a critical societal consideration in the application of Reinforcement Learning (RL): ensuring equitable outcomes across different demographic groups in multi-task settings. While previous work has explored fairness in single-task RL, many real-world applications are multi-task in nature and require policies to maintain fairness across all tasks. We introduce a novel formulation of multi-task group fairness in RL and propose a constrained optimization algorithm that explicitly enforces fairness constraints across multiple tasks simultaneously. We have shown that our proposed algorithm does not violate fairness constraints with high probability and with sublinear regret in the finite-horizon episodic setting. Through experiments in RiverSwim and MuJoCo environments, we demonstrate that our approach better ensures group fairness across multiple tasks compared to previous methods that lack explicit multi-task fairness constraints in both the finite-horizon setting and the infinite-horizon setting. Our results show that the proposed algorithm achieves smaller fairness gaps while maintaining comparable returns across different demographic groups and tasks, suggesting its potential for addressing fairness concerns in real-world multi-task RL applications.

Paper Structure

This paper contains 26 sections, 9 theorems, 121 equations, 7 figures, 2 tables, 3 algorithms.

Key Result

Lemma 1

(Lemma C.2 of satija2023group)On good event $\mathcal{E}$, for any policy $\pi$ and group $z \in \mathcal{Z}$, using the optimistic reward leads to a higher return compared to the true return. Proof. For any $k, h, s, a$, by the definition of optimistic reward from Equation (eq:optimistic reward), we have Additionally, by Holder's inequality Using the value difference lemma (lemma:value differe

Figures (7)

  • Figure 1: Results for both tasks: The first row shows results for Task 1, and the second row shows results for Task 2. Columns represent subgroup returns and fairness gaps.
  • Figure 2: Comparison between Ant and Humanoid: Performance and Fairness Gaps
  • Figure 3: Comparison between Hopper and Humanoid: Performance and Fairness Gaps
  • Figure 4: Comparison between Hopper and HugeGravity HalfCheetah: Performance and Fairness Gaps
  • Figure 5: Comparison between Original HalfCheetah and HugeGravity HalfCheetah: Performance and Fairness Gaps
  • ...and 2 more figures

Theorems & Definitions (9)

  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Lemma 6
  • Lemma 7
  • Lemma 8
  • Lemma 9