Diversity Drives Fairness: Ensemble of Higher Order Mutants for Intersectional Fairness of Machine Learning Software
Zhenpeng Chen, Xinyue Li, Jie M. Zhang, Federica Sarro, Yang Liu
TL;DR
FairHOME tackles the challenge of intersectional fairness in ML by performing higher-order mutation of input attributes during inference to create diverse subgroups, and then ensembling predictions from the same model. It avoids retraining, enabling applicability to deployed systems, and uses majority voting as a simple, effective default ensemble. In a large empirical study across 24 tasks and 7 baselines, FairHOME achieves an average 47.5% improvement in intersectional fairness with only a small ML-performance cost (0.1%–2.7%), and demonstrates the strongest fairness-performance trade-off using the Fairea benchmark. The approach also shows benefits for single-attribute group fairness and remains robust across ensemble strategies and mutation settings, making it a practical, scalable bias-mitigation technique for real-world ML software.
Abstract
Intersectional fairness is a critical requirement for Machine Learning (ML) software, demanding fairness across subgroups defined by multiple protected attributes. This paper introduces FairHOME, a novel ensemble approach using higher order mutation of inputs to enhance intersectional fairness of ML software during the inference phase. Inspired by social science theories highlighting the benefits of diversity, FairHOME generates mutants representing diverse subgroups for each input instance, thus broadening the array of perspectives to foster a fairer decision-making process. Unlike conventional ensemble methods that combine predictions made by different models, FairHOME combines predictions for the original input and its mutants, all generated by the same ML model, to reach a final decision. Notably, FairHOME is even applicable to deployed ML software as it bypasses the need for training new models. We extensively evaluate FairHOME against seven state-of-the-art fairness improvement methods across 24 decision-making tasks using widely adopted metrics. FairHOME consistently outperforms existing methods across all metrics considered. On average, it enhances intersectional fairness by 47.5%, surpassing the currently best-performing method by 9.6 percentage points.
