Suboptimal Shapley Value Explanations
Xiaolei Lu
TL;DR
The paper tackles suboptimal baselines in Shapley value explanations for black-box models, showing that replacement-based baselines can introduce asymmetric feature interactions that bias attribution. It introduces an uncertainty-based reweighting mechanism and a principled approach to minimize asymmetric interactions by driving $p(y|\mathbf{x}'_i)$ toward $p(y)$ or maximizing $H(Y|\mathbf{x}'_i)$, thereby accelerating computation and improving faithfulness. Through extensive NLP experiments on BERT-base and RoBERTa-base across SST-2, SNLI, Yelp-2, SNIPS, and 20Newsgroup, the method (random-uw and condition-uw) consistently enhances explanation fidelity as measured by LOR, SF, and CM, while in-distribution baselines further bolster results. Human and GPT-4 evaluations reveal a remaining gap between model-inferred explanations and human understanding, underscoring the need for more interpretable alignments in practice. Overall, the work offers a practical, theory-grounded path to more faithful Shapley-based explanations with faster computation for deep NLP models.
Abstract
Deep Neural Networks (DNNs) have demonstrated strong capacity in supporting a wide variety of applications. Shapley value has emerged as a prominent tool to analyze feature importance to help people understand the inference process of deep neural models. Computing Shapley value function requires choosing a baseline to represent feature's missingness. However, existing random and conditional baselines could negatively influence the explanation. In this paper, by analyzing the suboptimality of different baselines, we identify the problematic baseline where the asymmetric interaction between $\bm{x}'_i$ (the replacement of the faithful influential feature) and other features has significant directional bias toward the model's output, and conclude that $p(y|\bm{x}'_i) = p(y)$ potentially minimizes the asymmetric interaction involving $\bm{x}'_i$. We further generalize the uninformativeness of $\bm{x}'_i$ toward the label space $L$ to avoid estimating $p(y)$ and design a simple uncertainty-based reweighting mechanism to accelerate the computation process. We conduct experiments on various NLP tasks and our quantitative analysis demonstrates the effectiveness of the proposed uncertainty-based reweighting mechanism. Furthermore, by measuring the consistency of explanations generated by explainable methods and human, we highlight the disparity between model inference and human understanding.
