Evolved SampleWeights for Bias Mitigation: Effectiveness Depends on Optimization Objectives

Anil K. Saini; Jose Guadalupe Hernandez; Emily F. Wong; Debanshi Misra; Jason H. Moore

Evolved SampleWeights for Bias Mitigation: Effectiveness Depends on Optimization Objectives

Anil K. Saini, Jose Guadalupe Hernandez, Emily F. Wong, Debanshi Misra, Jason H. Moore

TL;DR

This work tackles algorithmic bias in real-world ML by evaluating reweighting as a pre-processing bias-mitigation technique. It systematically compares Equal, Deterministic, and Evolved Weights—where EW uses a Genetic Algorithm (NSGA-II) to optimize for both predictive performance and fairness—across 11 publicly available datasets. The study finds that EW often yields higher Pareto-front hypervolume than the baselines, but the extent of improvement depends strongly on the selected objective pair (e.g., $ACC$ or $AUROC$ combined with $DPD$ or $SFN$). This underscores the importance of aligning optimization criteria with desired fairness–performance outcomes and demonstrates the potential of evolutionary reweighting for bias mitigation in high-stakes domains, such as healthcare.

Abstract

Machine learning models trained on real-world data may inadvertently make biased predictions that negatively impact marginalized communities. Reweighting is a method that can mitigate such bias in model predictions by assigning a weight to each data point used during model training. In this paper, we compare three methods for generating these weights: (1) evolving them using a Genetic Algorithm (GA), (2) computing them using only dataset characteristics, and (3) assigning equal weights to all data points. Model performance under each strategy was evaluated using paired predictive and fairness metrics, which also served as optimization objectives for the GA during evolution. Specifically, we used two predictive metrics (accuracy and area under the Receiver Operating Characteristic curve) and two fairness metrics (demographic parity difference and subgroup false negative fairness). Using experiments on eleven publicly available datasets (including two medical datasets), we show that evolved sample weights can produce models that achieve better trade-offs between fairness and predictive performance than alternative weighting methods. However, the magnitude of these benefits depends strongly on the choice of optimization objectives. Our experiments reveal that optimizing with accuracy and demographic parity difference metrics yields the largest number of datasets for which evolved weights are significantly better than other weighting strategies in optimizing both objectives.

Evolved SampleWeights for Bias Mitigation: Effectiveness Depends on Optimization Objectives

TL;DR

Abstract

Evolved SampleWeights for Bias Mitigation: Effectiveness Depends on Optimization Objectives

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)