Analyzing Explainer Robustness via Probabilistic Lipschitzness of Prediction Functions
Zulqarnain Khan, Davin Hill, Aria Masoomi, Joshua Bone, Jennifer Dy
TL;DR
This work introduces Explainer Astuteness $A_{r,\lambda}$ to quantify robustness of post-hoc explanations and proves that the astuteness is lower-bounded by the predictor's probabilistic Lipschitzness. It derives theoretical bounds for SHAP, Remove Individual, and RISE explainers, showing that locally smooth predictors yield more astute explanations, with explicit $\lambda$-scaling such as $\lambda = 2 \sqrt[p]{d} \;L$ (SHAP) or $\lambda = \sqrt[p]{d} \;L$ (RISE). The authors validate the theory empirically across simulated and real datasets, demonstrating that Lipschitz-regularized models achieve higher astuteness at smaller $\lambda$, and they propose astuteness as a practical metric for explainer robustness. The findings suggest enforcing predictor smoothness as a viable strategy to improve explanation reliability, while acknowledging bound tightness and dataset-specific nuances. Formally, $A_{r,\lambda}(E, \mathcal{D})$ links to the predictor's local behavior via probabilistic Lipschitzness, highlighting a quantifiable path to robust explanations in complex models.
Abstract
Machine learning methods have significantly improved in their predictive capabilities, but at the same time they are becoming more complex and less transparent. As a result, explainers are often relied on to provide interpretability to these black-box prediction models. As crucial diagnostics tools, it is important that these explainers themselves are robust. In this paper we focus on one particular aspect of robustness, namely that an explainer should give similar explanations for similar data inputs. We formalize this notion by introducing and defining explainer astuteness, analogous to astuteness of prediction functions. Our formalism allows us to connect explainer robustness to the predictor's probabilistic Lipschitzness, which captures the probability of local smoothness of a function. We provide lower bound guarantees on the astuteness of a variety of explainers (e.g., SHAP, RISE, CXPlain) given the Lipschitzness of the prediction function. These theoretical results imply that locally smooth prediction functions lend themselves to locally robust explanations. We evaluate these results empirically on simulated as well as real datasets.
