Understanding Disparities in Post Hoc Machine Learning Explanation

Vishwali Mhasawade; Salman Rahman; Zoe Haskell-Craig; Rumi Chunara

Understanding Disparities in Post Hoc Machine Learning Explanation

Vishwali Mhasawade, Salman Rahman, Zoe Haskell-Craig, Rumi Chunara

TL;DR

Understanding Disparities in Post Hoc Machine Learning Explanation investigates why fidelity gaps in explanations (notably LIME) occur across sensitive subgroups. It introduces a data-generating process with a causal DAG and four objectives to probe sample size, covariate shift, concept shift, and omitted variable bias, using both linear and neural-network predictors on synthetic data and the Adult dataset. The results show that data properties and model complexity shape the fidelity gaps, with covariate shift, concept shift, and omitted variables amplifying disparities—often more so for neural networks—and that aligning the model with the causal structure by including the sensitive attribute can either mitigate or exacerbate disparities depending on the scenario. The work offers practical recommendations for designing explanation methods and argues for benchmark datasets to evaluate disparity across explainers and data-generating conditions beyond LIME.

Abstract

Previous work has highlighted that existing post-hoc explanation methods exhibit disparities in explanation fidelity (across 'race' and 'gender' as sensitive attributes), and while a large body of work focuses on mitigating these issues at the explanation metric level, the role of the data generating process and black box model in relation to explanation disparities remains largely unexplored. Accordingly, through both simulations as well as experiments on a real-world dataset, we specifically assess challenges to explanation disparities that originate from properties of the data: limited sample size, covariate shift, concept shift, omitted variable bias, and challenges based on model properties: inclusion of the sensitive attribute and appropriate functional form. Through controlled simulation analyses, our study demonstrates that increased covariate shift, concept shift, and omission of covariates increase explanation disparities, with the effect pronounced higher for neural network models that are better able to capture the underlying functional form in comparison to linear models. We also observe consistent findings regarding the effect of concept shift and omitted variable bias on explanation disparities in the Adult income dataset. Overall, results indicate that disparities in model explanations can also depend on data and model properties. Based on this systematic investigation, we provide recommendations for the design of explanation methods that mitigate undesirable disparities.

Understanding Disparities in Post Hoc Machine Learning Explanation

TL;DR

Abstract

Paper Structure (21 sections, 3 equations, 7 figures, 2 tables)

This paper contains 21 sections, 3 equations, 7 figures, 2 tables.

Introduction
Related Work
Data Generating Process and Objectives
Data Generating Process
Objective 1: effect of sample size of disadvantaged group data used for training
Objective 2: effect of covariate shift in disadvantaged group data between training and test distributions
Objective 3: effect of concept shift
Objective 4: effect of the magnitude of direct effect of the omitted covariate
Methods
Notation
Explanation Quality Metrics
Maximum Fidelity Gap from Average
Mean Fidelity Gap Amongst Subgroups
Experimental Setup
Results
...and 6 more sections

Figures (7)

Figure 1: Causal DAG for the synthetic datasets (a, b) and the Adult dataset (c). In (a) is the causal graph describing the data-generating process (DGP) for objectives 1, 2, and 4, (b) is the causal graph for the DGP for objective 3 (concept shift). The concept shift is represented as the arrow showing the effect of $A$ on the relationship between $L$ and $Y$, such that $P(Y|L, A = 0) \ne P(Y|L, A = 1)$. In (c) we consider gender as the sensitive attribute of interest. $A$ and $M$ represent gender and marital status, respectively. $C$ is age and nationality, $L$ is the level of education, $R$ corresponds to the working class, occupation, and hours per week, and $Y$ is the income class.
Figure 2: Percent Mean Fidelity Gap (Accuracy) of LIME applied to models built on the synthetic datasets generated for (a) objective 1 - sample size, (b) objective 2 - covariate shift, (c) objective 3 - concept shift, and (d) objective 4 - omitted variables. In (a), we vary the proportion of the disadvantaged group in the training set sample. In (b), we introduce a covariate shift for the disadvantaged group, shifting the overlap between the train and test distributions; and in (c), we vary the magnitude of the concept shift. In (d), we adjust the strength of the direct effect of the omitted variable $C$. The models that are considered are LR with $A$, LR$_{A}$ in blue, LR without $A$, LR$_{\not \mathbf{A}}$ in green, NN with $A$, NN$_{A}$ in red, and NN without $A$, NN$_{\not \mathbf{A}}$ in violet, LR without $C$, LR$_{\not C}$ in yellow, and NN without $C$, NN$_{\not C}$ in plum. Circles represent linear models, and triangles represent neural network models. Notice that the magnitude of the mean fidelity gap is much larger under conditions of (b) covariate shift and (c) concept shift than either (a) sample size differences or (d) omitted variables.
Figure 3: Percent Maximum Fidelity Gap of LIME applied to models built on the synthetic datasets generated for (a) objective 1 - sample size, (b) objective 2 - covariate shift, (c) objective 3 - concept shift, and (d) objective 4 - omitted variables for LR with $A$, LR$_{A}$ in blue, LR without $A$, LR$_{\not \mathbf{A}}$ in green, NN with $A$, NN$_{A}$ in red, and NN without $A$, NN$_{\not \mathbf{A}}$ in violet, LR without $C$, LR$_{\not C}$ in yellow, and NN without $C$, NN$_{\not C}$ in plum. Circles represent linear models, and triangles represent neural network models.
Figure 4: Performance disparity of $f()$ calculated as Accuracy for A = 1 - Accuracy for A = 0 on the synthetic datasets generated with an increasing (a) proportion of the disadvantaged sample (objective 1), (b) overlap between the distribution of $L$ for $A=0$ between training and test distributions, (c) concept shift, and (d) direct effect of omitted variable $C$ for LR with $A$, LR$_{A}$ in blue, LR without $A$, LR$_{\not \mathbf{A}}$ in green, NN with $A$, NN$_{A}$ in red, and NN without $A$, NN$_{\not \mathbf{A}}$ in violet. Circles represent linear models, and triangles represent neural network models.
Figure 5: (a) Percent Maximum Fidelity Gap, $\Delta_{Acc}$, (b) mean fidelity gap in accuracy, $\Delta_{Acc}^{group}$ of LIME on Adult dataset with variation in the proportion of the 'males' ($A$) in the training sample (objective 1) for LR with $A$, LR$_{A}$ in blue, LR without $A$, LR$_{\not \mathbf{A}}$ in green, NN with $A$, NN$_{A}$ in red, and NN without $A$, NN$_{\not \mathbf{A}}$ in violet. Circles represent linear models, and triangles represent neural network models.
...and 2 more figures

Understanding Disparities in Post Hoc Machine Learning Explanation

TL;DR

Abstract

Understanding Disparities in Post Hoc Machine Learning Explanation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)