Suboptimal Shapley Value Explanations

Xiaolei Lu

Suboptimal Shapley Value Explanations

Xiaolei Lu

TL;DR

The paper tackles suboptimal baselines in Shapley value explanations for black-box models, showing that replacement-based baselines can introduce asymmetric feature interactions that bias attribution. It introduces an uncertainty-based reweighting mechanism and a principled approach to minimize asymmetric interactions by driving $p(y|\mathbf{x}'_i)$ toward $p(y)$ or maximizing $H(Y|\mathbf{x}'_i)$, thereby accelerating computation and improving faithfulness. Through extensive NLP experiments on BERT-base and RoBERTa-base across SST-2, SNLI, Yelp-2, SNIPS, and 20Newsgroup, the method (random-uw and condition-uw) consistently enhances explanation fidelity as measured by LOR, SF, and CM, while in-distribution baselines further bolster results. Human and GPT-4 evaluations reveal a remaining gap between model-inferred explanations and human understanding, underscoring the need for more interpretable alignments in practice. Overall, the work offers a practical, theory-grounded path to more faithful Shapley-based explanations with faster computation for deep NLP models.

Abstract

Deep Neural Networks (DNNs) have demonstrated strong capacity in supporting a wide variety of applications. Shapley value has emerged as a prominent tool to analyze feature importance to help people understand the inference process of deep neural models. Computing Shapley value function requires choosing a baseline to represent feature's missingness. However, existing random and conditional baselines could negatively influence the explanation. In this paper, by analyzing the suboptimality of different baselines, we identify the problematic baseline where the asymmetric interaction between $\bm{x}'_i$ (the replacement of the faithful influential feature) and other features has significant directional bias toward the model's output, and conclude that $p(y|\bm{x}'_i) = p(y)$ potentially minimizes the asymmetric interaction involving $\bm{x}'_i$. We further generalize the uninformativeness of $\bm{x}'_i$ toward the label space $L$ to avoid estimating $p(y)$ and design a simple uncertainty-based reweighting mechanism to accelerate the computation process. We conduct experiments on various NLP tasks and our quantitative analysis demonstrates the effectiveness of the proposed uncertainty-based reweighting mechanism. Furthermore, by measuring the consistency of explanations generated by explainable methods and human, we highlight the disparity between model inference and human understanding.

Suboptimal Shapley Value Explanations

TL;DR

toward

or maximizing

, thereby accelerating computation and improving faithfulness. Through extensive NLP experiments on BERT-base and RoBERTa-base across SST-2, SNLI, Yelp-2, SNIPS, and 20Newsgroup, the method (random-uw and condition-uw) consistently enhances explanation fidelity as measured by LOR, SF, and CM, while in-distribution baselines further bolster results. Human and GPT-4 evaluations reveal a remaining gap between model-inferred explanations and human understanding, underscoring the need for more interpretable alignments in practice. Overall, the work offers a practical, theory-grounded path to more faithful Shapley-based explanations with faster computation for deep NLP models.

Abstract

(the replacement of the faithful influential feature) and other features has significant directional bias toward the model's output, and conclude that

potentially minimizes the asymmetric interaction involving

. We further generalize the uninformativeness of

toward the label space

to avoid estimating

and design a simple uncertainty-based reweighting mechanism to accelerate the computation process. We conduct experiments on various NLP tasks and our quantitative analysis demonstrates the effectiveness of the proposed uncertainty-based reweighting mechanism. Furthermore, by measuring the consistency of explanations generated by explainable methods and human, we highlight the disparity between model inference and human understanding.

Suboptimal Shapley Value Explanations

TL;DR

Abstract

Suboptimal Shapley Value Explanations

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)