Table of Contents
Fetching ...

Falcon: Fair Active Learning using Multi-armed Bandits

Ki Hyun Tae, Hantian Zhang, Jaeyoung Park, Kexin Rong, Steven Euijong Whang

TL;DR

Falcon addresses biased data causing unfair ML by introducing a data-centric fair active learning framework that strategically labels samples to improve group fairness. It couples a trial-and-error labeling strategy for unknown ground truth with adversarial multi-armed bandits to automatically select sampling policies, enabling robust trade-offs between informativeness and postpone rate. The approach also blends fairness-driven labeling with traditional active learning to improve accuracy while maintaining fairness. Empirical results on four real datasets show Falcon significantly outperforms baselines in fairness and accuracy while being notably more efficient, achieving up to 1.8–4.5x higher maximum fairness scores than the second-best methods.

Abstract

Biased data can lead to unfair machine learning models, highlighting the importance of embedding fairness at the beginning of data analysis, particularly during dataset curation and labeling. In response, we propose Falcon, a scalable fair active learning framework. Falcon adopts a data-centric approach that improves machine learning model fairness via strategic sample selection. Given a user-specified group fairness measure, Falcon identifies samples from "target groups" (e.g., (attribute=female, label=positive)) that are the most informative for improving fairness. However, a challenge arises since these target groups are defined using ground truth labels that are not available during sample selection. To handle this, we propose a novel trial-and-error method, where we postpone using a sample if the predicted label is different from the expected one and falls outside the target group. We also observe the trade-off that selecting more informative samples results in higher likelihood of postponing due to undesired label prediction, and the optimal balance varies per dataset. We capture the trade-off between informativeness and postpone rate as policies and propose to automatically select the best policy using adversarial multi-armed bandit methods, given their computational efficiency and theoretical guarantees. Experiments show that Falcon significantly outperforms existing fair active learning approaches in terms of fairness and accuracy and is more efficient. In particular, only Falcon supports a proper trade-off between accuracy and fairness where its maximum fairness score is 1.8-4.5x higher than the second-best results.

Falcon: Fair Active Learning using Multi-armed Bandits

TL;DR

Falcon addresses biased data causing unfair ML by introducing a data-centric fair active learning framework that strategically labels samples to improve group fairness. It couples a trial-and-error labeling strategy for unknown ground truth with adversarial multi-armed bandits to automatically select sampling policies, enabling robust trade-offs between informativeness and postpone rate. The approach also blends fairness-driven labeling with traditional active learning to improve accuracy while maintaining fairness. Empirical results on four real datasets show Falcon significantly outperforms baselines in fairness and accuracy while being notably more efficient, achieving up to 1.8–4.5x higher maximum fairness scores than the second-best methods.

Abstract

Biased data can lead to unfair machine learning models, highlighting the importance of embedding fairness at the beginning of data analysis, particularly during dataset curation and labeling. In response, we propose Falcon, a scalable fair active learning framework. Falcon adopts a data-centric approach that improves machine learning model fairness via strategic sample selection. Given a user-specified group fairness measure, Falcon identifies samples from "target groups" (e.g., (attribute=female, label=positive)) that are the most informative for improving fairness. However, a challenge arises since these target groups are defined using ground truth labels that are not available during sample selection. To handle this, we propose a novel trial-and-error method, where we postpone using a sample if the predicted label is different from the expected one and falls outside the target group. We also observe the trade-off that selecting more informative samples results in higher likelihood of postponing due to undesired label prediction, and the optimal balance varies per dataset. We capture the trade-off between informativeness and postpone rate as policies and propose to automatically select the best policy using adversarial multi-armed bandit methods, given their computational efficiency and theoretical guarantees. Experiments show that Falcon significantly outperforms existing fair active learning approaches in terms of fairness and accuracy and is more efficient. In particular, only Falcon supports a proper trade-off between accuracy and fairness where its maximum fairness score is 1.8-4.5x higher than the second-best results.
Paper Structure (62 sections, 7 equations, 12 figures, 14 tables, 2 algorithms)

This paper contains 62 sections, 7 equations, 12 figures, 14 tables, 2 algorithms.

Figures (12)

  • Figure 1: Fair active learning involves selecting samples that, when labeled, would enhance the fairness of a machine learning model according to a specific group fairness measure.
  • Figure 2: Overview of Falcon workflow.
  • Figure 3: Comparing trial-and-error approach with the baselines on the TravelTime ding2021retiring and Employ ding2021retiring datasets where the target fairness is demographic parity (DP). Only the trial-and-error solution actually improves the DP score.
  • Figure 4: If the positive class is our target group, sample $A$ increases the target group accuracy more than $B$ if positively labeled and is thus more informative. However, $A$ is less likely to have the desired target label, leading to a higher postpone rate. It is non-trivial to balance between these two factors.
  • Figure 5: Policy comparison using the TravelTime and Employ datasets where the $i^{th}$ target groups are denoted as $T_i$ and $E_i$, respectively. DP fairness is used. The best policy depends on the dataset and how much labeling has been done.
  • ...and 7 more figures

Theorems & Definitions (5)

  • Example 1
  • Example 2
  • Example 3
  • Example 4
  • Definition 1