Table of Contents
Fetching ...

Fairness Shields: Safeguarding against Biased Decision Makers

Filip Cano, Thomas A. Henzinger, Bettina Könighofer, Konstantin Kueffner, Kaushik Mallik

TL;DR

This work introduces fairness shields, run-time intervention mechanisms that monitor sequential AI decisions to enforce a fairness criterion while minimizing intervention costs. It formalizes bounded-horizon and periodic fairness as DoR-based constraints and develops FinHzn shield synthesis via dynamic programming, along with static and dynamic periodic shielding strategies. The paper proves computationally efficient methods that rely on counters tracking fairness statistics and demonstrates empirically that shields significantly reduce bias with modest utility losses across diverse datasets and models. The approach provides practical, verifiable guarantees for protecting against biased decisions in deployed sequential decision-makers, with clear guidance for static versus dynamic shielding and future extensions to broader fairness notions and human-in-the-loop settings.

Abstract

As AI-based decision-makers increasingly influence human lives, it is a growing concern that their decisions are often unfair or biased with respect to people's sensitive attributes, such as gender and race. Most existing bias prevention measures provide probabilistic fairness guarantees in the long run, and it is possible that the decisions are biased on specific instances of short decision sequences. We introduce fairness shielding, where a symbolic decision-maker -- the fairness shield -- continuously monitors the sequence of decisions of another deployed black-box decision-maker, and makes interventions so that a given fairness criterion is met while the total intervention costs are minimized. We present four different algorithms for computing fairness shields, among which one guarantees fairness over fixed horizons, and three guarantee fairness periodically after fixed intervals. Given a distribution over future decisions and their intervention costs, our algorithms solve different instances of bounded-horizon optimal control problems with different levels of computational costs and optimality guarantees. Our empirical evaluation demonstrates the effectiveness of these shields in ensuring fairness while maintaining cost efficiency across various scenarios.

Fairness Shields: Safeguarding against Biased Decision Makers

TL;DR

This work introduces fairness shields, run-time intervention mechanisms that monitor sequential AI decisions to enforce a fairness criterion while minimizing intervention costs. It formalizes bounded-horizon and periodic fairness as DoR-based constraints and develops FinHzn shield synthesis via dynamic programming, along with static and dynamic periodic shielding strategies. The paper proves computationally efficient methods that rely on counters tracking fairness statistics and demonstrates empirically that shields significantly reduce bias with modest utility losses across diverse datasets and models. The approach provides practical, verifiable guarantees for protecting against biased decisions in deployed sequential decision-makers, with clear guidance for static versus dynamic shielding and future extensions to broader fairness notions and human-in-the-loop settings.

Abstract

As AI-based decision-makers increasingly influence human lives, it is a growing concern that their decisions are often unfair or biased with respect to people's sensitive attributes, such as gender and race. Most existing bias prevention measures provide probabilistic fairness guarantees in the long run, and it is possible that the decisions are biased on specific instances of short decision sequences. We introduce fairness shielding, where a symbolic decision-maker -- the fairness shield -- continuously monitors the sequence of decisions of another deployed black-box decision-maker, and makes interventions so that a given fairness criterion is met while the total intervention costs are minimized. We present four different algorithms for computing fairness shields, among which one guarantees fairness over fixed horizons, and three guarantee fairness periodically after fixed intervals. Given a distribution over future decisions and their intervention costs, our algorithms solve different instances of bounded-horizon optimal control problems with different levels of computational costs and optimality guarantees. Our empirical evaluation demonstrates the effectiveness of these shields in ensuring fairness while maintaining cost efficiency across various scenarios.

Paper Structure

This paper contains 55 sections, 10 theorems, 62 equations, 10 figures, 11 tables.

Key Result

Theorem 1

The bounded-horizon shield-synthesis problem can be solved in $\mathcal{O}(|R_{\mu, T}|\cdot |\mathcal{X}|)$-time and $\mathcal{O}(|R_{\mu,T}|\cdot |\mathcal{X}|)$-space.

Figures (10)

  • Figure 1: The operational diagram of fairness shields.
  • Figure 2: Variations of bias over time for the ERM classifier on the Adult dataset with and without periodic shielding.
  • Figure 3: Resource usage for shield synthesis with increasing time horizons.
  • Figure 4: Regression plot depicting the relationship between utility loss and cost for various $\kappa$ for each dataset. DP (left) and EqOpp (right)
  • Figure 5: Distribution of normalized bias, i.e. Bias / $\kappa$, across all runs with (left) and without shield (right) for both DP and EqOpp.
  • ...and 5 more figures

Theorems & Definitions (33)

  • Example 1: Running example - Bilingual team
  • Example 2: Continuation of Ex. \ref{['ex:running example']}
  • Example 3: Continuation of Ex. \ref{['ex:running_example2']}
  • Example 4: Continuation of Ex. \ref{['ex:running_example3']}
  • Definition 1: FinHzn shields
  • Definition 2: Optimal $T$-periodic fairness shield
  • Theorem 1
  • Definition 3: Static-Fair shields
  • Example 5
  • Definition 4: Balanced traces
  • ...and 23 more