Table of Contents
Fetching ...

Analyzing Explainer Robustness via Probabilistic Lipschitzness of Prediction Functions

Zulqarnain Khan, Davin Hill, Aria Masoomi, Joshua Bone, Jennifer Dy

TL;DR

This work introduces Explainer Astuteness $A_{r,\lambda}$ to quantify robustness of post-hoc explanations and proves that the astuteness is lower-bounded by the predictor's probabilistic Lipschitzness. It derives theoretical bounds for SHAP, Remove Individual, and RISE explainers, showing that locally smooth predictors yield more astute explanations, with explicit $\lambda$-scaling such as $\lambda = 2 \sqrt[p]{d} \;L$ (SHAP) or $\lambda = \sqrt[p]{d} \;L$ (RISE). The authors validate the theory empirically across simulated and real datasets, demonstrating that Lipschitz-regularized models achieve higher astuteness at smaller $\lambda$, and they propose astuteness as a practical metric for explainer robustness. The findings suggest enforcing predictor smoothness as a viable strategy to improve explanation reliability, while acknowledging bound tightness and dataset-specific nuances. Formally, $A_{r,\lambda}(E, \mathcal{D})$ links to the predictor's local behavior via probabilistic Lipschitzness, highlighting a quantifiable path to robust explanations in complex models.

Abstract

Machine learning methods have significantly improved in their predictive capabilities, but at the same time they are becoming more complex and less transparent. As a result, explainers are often relied on to provide interpretability to these black-box prediction models. As crucial diagnostics tools, it is important that these explainers themselves are robust. In this paper we focus on one particular aspect of robustness, namely that an explainer should give similar explanations for similar data inputs. We formalize this notion by introducing and defining explainer astuteness, analogous to astuteness of prediction functions. Our formalism allows us to connect explainer robustness to the predictor's probabilistic Lipschitzness, which captures the probability of local smoothness of a function. We provide lower bound guarantees on the astuteness of a variety of explainers (e.g., SHAP, RISE, CXPlain) given the Lipschitzness of the prediction function. These theoretical results imply that locally smooth prediction functions lend themselves to locally robust explanations. We evaluate these results empirically on simulated as well as real datasets.

Analyzing Explainer Robustness via Probabilistic Lipschitzness of Prediction Functions

TL;DR

This work introduces Explainer Astuteness to quantify robustness of post-hoc explanations and proves that the astuteness is lower-bounded by the predictor's probabilistic Lipschitzness. It derives theoretical bounds for SHAP, Remove Individual, and RISE explainers, showing that locally smooth predictors yield more astute explanations, with explicit -scaling such as (SHAP) or (RISE). The authors validate the theory empirically across simulated and real datasets, demonstrating that Lipschitz-regularized models achieve higher astuteness at smaller , and they propose astuteness as a practical metric for explainer robustness. The findings suggest enforcing predictor smoothness as a viable strategy to improve explanation reliability, while acknowledging bound tightness and dataset-specific nuances. Formally, links to the predictor's local behavior via probabilistic Lipschitzness, highlighting a quantifiable path to robust explanations in complex models.

Abstract

Machine learning methods have significantly improved in their predictive capabilities, but at the same time they are becoming more complex and less transparent. As a result, explainers are often relied on to provide interpretability to these black-box prediction models. As crucial diagnostics tools, it is important that these explainers themselves are robust. In this paper we focus on one particular aspect of robustness, namely that an explainer should give similar explanations for similar data inputs. We formalize this notion by introducing and defining explainer astuteness, analogous to astuteness of prediction functions. Our formalism allows us to connect explainer robustness to the predictor's probabilistic Lipschitzness, which captures the probability of local smoothness of a function. We provide lower bound guarantees on the astuteness of a variety of explainers (e.g., SHAP, RISE, CXPlain) given the Lipschitzness of the prediction function. These theoretical results imply that locally smooth prediction functions lend themselves to locally robust explanations. We evaluate these results empirically on simulated as well as real datasets.
Paper Structure (20 sections, 6 theorems, 40 equations, 6 figures, 5 tables)

This paper contains 20 sections, 6 theorems, 40 equations, 6 figures, 5 tables.

Key Result

Lemma 1

If, then for $y=x \odot z_{+i}, y'= x' \odot z_{+i}$, i.e. $y,y' \in \cup\mathbb{N}_k=\{y | y \in \mathbb{R}^d, ||y||_0 = k, y_i \neq 0\}$ for $k=1,\ldots,d$ where $\beta \geq \alpha$ assuming that the distribution $\mathcal{D}$ is defined for all $x$ and $y$ and the equality is approached if the probability of sampling points from the set $\mathbb{N}_k=\{y | y \in \mathbb{R}^d, ||y||_0 = k, y_i

Figures (6)

  • Figure 1: Smoother black-box predictors lead to more astute explainers and robust explanations. Samples from a simulated dataset are plotted in blue, with red arrows representing respective post-hoc SHAP explanations plotted as a vector. A) When a neural network (NN) is trained with no Lipschitz constraints, explanations of nearby points can vary significantly, as evidenced by the arrows varying in length and direction. B) When the NN is retrained with Lipschitz regularization suggested by gouk2021regularisation, explanations are observed to be more aligned in length and direction, indicating higher robustness.
  • Figure 2: Summary of our theoretical results.(A) For a black-box prediction function that is locally Lipschitz with a constant $L_1$, the predictions for any two points $x, x'$ (shown here with two similar images from MNIST dataset) such that $d_p(x,x') \leq r$ are within $L_1d_p(x,x')$ distance from each other. (B) Given such a prediction function, the explanation (feature attributions shown, with brighter colors indicating higher scores) for the same data points are also expected to be within $\lambda_1 d_p(x,x')$ of each other where $\lambda_1 = CL_1\sqrt{d}$ where C is a constant. (C) For a second black-box model with $L_2 > L_1$, our results show that (D)$\lambda_2 > \lambda_1$, indicating that the explanations for this black-box model can be farther apart as compared to the first prediction function. This result implies that locally smooth black-box models lend themselves to more astute explainers.
  • Figure 3: Smooth functions result in astute explanations. Regularizing the Lipschitzness of a neural network during training results in higher astuteness for the same value of $\lambda$. Higher regularization results in lower Lipschitz constant gouk2021regularisation. Astuteness reaches $1$ for smaller values of $\lambda$ with Lipschitz regularized training, as expected from our theorems. The errorbars represent results across 5 runs to account for randomness in explainer runs.
  • Figure 4: Astuteness as a metric. Different explainers display different levels of astuteness. In our experiments, RISE consistently displayed lower astuteness for the same value of $\lambda$ compared to SHAP and CXPLAIN. This indicates that among these three , on the considered datasets and classifiers, RISE is least robust and CXPlain is the most robust. Results are shown across 20 explainer runs.
  • Figure 5: This figure experimentally shows the implication of our theoretical results. It corresponds to the AUC values shown in Table \ref{['table:auc']}. Given each combination of dataset, classifier and explainer we observe that the estimated explainer astuteness for SHAP, RISE and CXPLAIN is lower bounded by the astuteness predicted by our theoretical results given a value of $\lambda$. The predicted lower bound is depicted by dashed lines, while solid lines depict the actual estimate of explainer astuteness.
  • ...and 1 more figures

Theorems & Definitions (17)

  • Definition 1
  • Definition 2
  • Definition 3
  • Lemma 1
  • Theorem 1
  • proof
  • Corollary 1
  • proof
  • Theorem 2
  • proof
  • ...and 7 more