Table of Contents
Fetching ...

Balancing the Scales: Reinforcement Learning for Fair Classification

Leon Eshuijs, Shihan Wang, Antske Fokkens

TL;DR

This work reframes fair classification as a Contextual Multi-Armed Bandit problem and leverages reward scaling to counter imbalances across classes and protected groups. It evaluates three RL algorithms—LinUCB, DQN_bandit, and PPO_bandit—as well as a supervised loss-reweighting baseline, on BiasBios (multi-class) and Emoji (binary) datasets, demonstrating that reward scaling can improve fairness with varying effects on accuracy. The study finds that deep RL methods excel in multi-class settings while a classical CMAB approach can be highly effective in binary tasks, and that Equal Opportunity-based scaling often yields robust fairness with modest accuracy trade-offs. These findings suggest RL-based fairness tools can adapt to diverse data imbalances and highlight practical considerations for deploying fair classifiers in real-world settings.

Abstract

Fairness in classification tasks has traditionally focused on bias removal from neural representations, but recent trends favor algorithmic methods that embed fairness into the training process. These methods steer models towards fair performance, preventing potential elimination of valuable information that arises from representation manipulation. Reinforcement Learning (RL), with its capacity for learning through interaction and adjusting reward functions to encourage desired behaviors, emerges as a promising tool in this domain. In this paper, we explore the usage of RL to address bias in imbalanced classification by scaling the reward function to mitigate bias. We employ the contextual multi-armed bandit framework and adapt three popular RL algorithms to suit our objectives, demonstrating a novel approach to mitigating bias.

Balancing the Scales: Reinforcement Learning for Fair Classification

TL;DR

This work reframes fair classification as a Contextual Multi-Armed Bandit problem and leverages reward scaling to counter imbalances across classes and protected groups. It evaluates three RL algorithms—LinUCB, DQN_bandit, and PPO_bandit—as well as a supervised loss-reweighting baseline, on BiasBios (multi-class) and Emoji (binary) datasets, demonstrating that reward scaling can improve fairness with varying effects on accuracy. The study finds that deep RL methods excel in multi-class settings while a classical CMAB approach can be highly effective in binary tasks, and that Equal Opportunity-based scaling often yields robust fairness with modest accuracy trade-offs. These findings suggest RL-based fairness tools can adapt to diverse data imbalances and highlight practical considerations for deploying fair classifiers in real-world settings.

Abstract

Fairness in classification tasks has traditionally focused on bias removal from neural representations, but recent trends favor algorithmic methods that embed fairness into the training process. These methods steer models towards fair performance, preventing potential elimination of valuable information that arises from representation manipulation. Reinforcement Learning (RL), with its capacity for learning through interaction and adjusting reward functions to encourage desired behaviors, emerges as a promising tool in this domain. In this paper, we explore the usage of RL to address bias in imbalanced classification by scaling the reward function to mitigate bias. We employ the contextual multi-armed bandit framework and adapt three popular RL algorithms to suit our objectives, demonstrating a novel approach to mitigating bias.
Paper Structure (44 sections, 15 equations, 6 figures, 11 tables, 1 algorithm)

This paper contains 44 sections, 15 equations, 6 figures, 11 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of the classification setup with input vector $x$, and output class $a$ for Reinforcement Learning and Supervised Learning, highlighting the place of the reward scaling matrix $\mathcal{W}_{RS}$
  • Figure 2: Reward scales for the professions with different gender imbalances Professor (50/50) and Nurse (90/10) using the different scaling functions.
  • Figure 3: TPR gap plotted against the gender distribution per profession for LinUCB. Left without reward scaling and right with EO reward scaling
  • Figure 4: Performance (Accuracy) and Fairness (GAP) on the Emoji dataset using different stereotyping ratios. All models use the scaling of $\mathcal{W}^{EO}$.
  • Figure 5: Evaluation accuracy of the different algorithms the full 28 classes and the 8 class subset of the Bias in Bios dataset
  • ...and 1 more figures