Feature Importance Disparities for Data Bias Investigations
Peter W. Chang, Leor Fishman, Seth Neel
TL;DR
This work introduces feature importance disparity ($ ext{FID}$) as a data-centric diagnostic for data bias investigations, assessing how a feature's influence differs between a subgroup and the overall population. By formalizing $ ext{FID}$ with separable feature importance notions and proposing an oracle-efficient optimization using a Cost-Sensitive Classification (CSC) oracle, the authors efficiently identify high-$ ext{AVG-FID}$ subgroups even in exponentially large subgroup spaces. Empirically across four datasets and multiple explanations (LIME, SHAP, GRAD, LIN-FID), large $ ext{AVG-FID}$ subgroups are found, often aligning with fairness-metric disparities and generalizing out-of-sample; rich subgroups frequently yield larger disparities than marginal ones. The method offers a practical toolkit for DBI, enabling targeted interventions such as subgroup-specific modeling or data-collection investigations, while acknowledging limitations in explanation stability and the need for predefined sensitive features. The work thus complements fairness research with a data-centric perspective on bias sources in tabular data.
Abstract
It is widely held that one cause of downstream bias in classifiers is bias present in the training data. Rectifying such biases may involve context-dependent interventions such as training separate models on subgroups, removing features with bias in the collection process, or even conducting real-world experiments to ascertain sources of bias. Despite the need for such data bias investigations, few automated methods exist to assist practitioners in these efforts. In this paper, we present one such method that given a dataset $X$ consisting of protected and unprotected features, outcomes $y$, and a regressor $h$ that predicts $y$ given $X$, outputs a tuple $(f_j, g)$, with the following property: $g$ corresponds to a subset of the training dataset $(X, y)$, such that the $j^{th}$ feature $f_j$ has much larger (or smaller) influence in the subgroup $g$, than on the dataset overall, which we call feature importance disparity (FID). We show across $4$ datasets and $4$ common feature importance methods of broad interest to the machine learning community that we can efficiently find subgroups with large FID values even over exponentially large subgroup classes and in practice these groups correspond to subgroups with potentially serious bias issues as measured by standard fairness metrics.
