Improving Perturbation-based Explanations by Understanding the Role of Uncertainty Calibration
Thomas Decker, Volker Tresp, Florian Buettner
TL;DR
This work addresses the instability of perturbation-based explanations caused by miscalibrated model confidences under perturbations. It theoretically links calibration quality to both global and local explanation fidelity and introduces ReCalX, an information-preserving, perturbation-aware recalibration method with adaptive temperatures. Theoretical results show that perfect calibration under perturbations yields ideal explanatory power, while empirical results across tabular and image tasks demonstrate that ReCalX substantially reduces perturbation-specific calibration error and improves explanation robustness and feature retraining fidelity. The method offers a practical, low-overhead enhancement to explanation quality with broad applicability beyond classification tasks.
Abstract
Perturbation-based explanations are widely utilized to enhance the transparency of machine-learning models in practice. However, their reliability is often compromised by the unknown model behavior under the specific perturbations used. This paper investigates the relationship between uncertainty calibration - the alignment of model confidence with actual accuracy - and perturbation-based explanations. We show that models systematically produce unreliable probability estimates when subjected to explainability-specific perturbations and theoretically prove that this directly undermines global and local explanation quality. To address this, we introduce ReCalX, a novel approach to recalibrate models for improved explanations while preserving their original predictions. Empirical evaluations across diverse models and datasets demonstrate that ReCalX consistently reduces perturbation-specific miscalibration most effectively while enhancing explanation robustness and the identification of globally important input features.
