Fairness Shields: Safeguarding against Biased Decision Makers
Filip Cano, Thomas A. Henzinger, Bettina Könighofer, Konstantin Kueffner, Kaushik Mallik
TL;DR
This work introduces fairness shields, run-time intervention mechanisms that monitor sequential AI decisions to enforce a fairness criterion while minimizing intervention costs. It formalizes bounded-horizon and periodic fairness as DoR-based constraints and develops FinHzn shield synthesis via dynamic programming, along with static and dynamic periodic shielding strategies. The paper proves computationally efficient methods that rely on counters tracking fairness statistics and demonstrates empirically that shields significantly reduce bias with modest utility losses across diverse datasets and models. The approach provides practical, verifiable guarantees for protecting against biased decisions in deployed sequential decision-makers, with clear guidance for static versus dynamic shielding and future extensions to broader fairness notions and human-in-the-loop settings.
Abstract
As AI-based decision-makers increasingly influence human lives, it is a growing concern that their decisions are often unfair or biased with respect to people's sensitive attributes, such as gender and race. Most existing bias prevention measures provide probabilistic fairness guarantees in the long run, and it is possible that the decisions are biased on specific instances of short decision sequences. We introduce fairness shielding, where a symbolic decision-maker -- the fairness shield -- continuously monitors the sequence of decisions of another deployed black-box decision-maker, and makes interventions so that a given fairness criterion is met while the total intervention costs are minimized. We present four different algorithms for computing fairness shields, among which one guarantees fairness over fixed horizons, and three guarantee fairness periodically after fixed intervals. Given a distribution over future decisions and their intervention costs, our algorithms solve different instances of bounded-horizon optimal control problems with different levels of computational costs and optimality guarantees. Our empirical evaluation demonstrates the effectiveness of these shields in ensuring fairness while maintaining cost efficiency across various scenarios.
