Table of Contents
Fetching ...

Improving Perturbation-based Explanations by Understanding the Role of Uncertainty Calibration

Thomas Decker, Volker Tresp, Florian Buettner

TL;DR

This work addresses the instability of perturbation-based explanations caused by miscalibrated model confidences under perturbations. It theoretically links calibration quality to both global and local explanation fidelity and introduces ReCalX, an information-preserving, perturbation-aware recalibration method with adaptive temperatures. Theoretical results show that perfect calibration under perturbations yields ideal explanatory power, while empirical results across tabular and image tasks demonstrate that ReCalX substantially reduces perturbation-specific calibration error and improves explanation robustness and feature retraining fidelity. The method offers a practical, low-overhead enhancement to explanation quality with broad applicability beyond classification tasks.

Abstract

Perturbation-based explanations are widely utilized to enhance the transparency of machine-learning models in practice. However, their reliability is often compromised by the unknown model behavior under the specific perturbations used. This paper investigates the relationship between uncertainty calibration - the alignment of model confidence with actual accuracy - and perturbation-based explanations. We show that models systematically produce unreliable probability estimates when subjected to explainability-specific perturbations and theoretically prove that this directly undermines global and local explanation quality. To address this, we introduce ReCalX, a novel approach to recalibrate models for improved explanations while preserving their original predictions. Empirical evaluations across diverse models and datasets demonstrate that ReCalX consistently reduces perturbation-specific miscalibration most effectively while enhancing explanation robustness and the identification of globally important input features.

Improving Perturbation-based Explanations by Understanding the Role of Uncertainty Calibration

TL;DR

This work addresses the instability of perturbation-based explanations caused by miscalibrated model confidences under perturbations. It theoretically links calibration quality to both global and local explanation fidelity and introduces ReCalX, an information-preserving, perturbation-aware recalibration method with adaptive temperatures. Theoretical results show that perfect calibration under perturbations yields ideal explanatory power, while empirical results across tabular and image tasks demonstrate that ReCalX substantially reduces perturbation-specific calibration error and improves explanation robustness and feature retraining fidelity. The method offers a practical, low-overhead enhancement to explanation quality with broad applicability beyond classification tasks.

Abstract

Perturbation-based explanations are widely utilized to enhance the transparency of machine-learning models in practice. However, their reliability is often compromised by the unknown model behavior under the specific perturbations used. This paper investigates the relationship between uncertainty calibration - the alignment of model confidence with actual accuracy - and perturbation-based explanations. We show that models systematically produce unreliable probability estimates when subjected to explainability-specific perturbations and theoretically prove that this directly undermines global and local explanation quality. To address this, we introduce ReCalX, a novel approach to recalibrate models for improved explanations while preserving their original predictions. Empirical evaluations across diverse models and datasets demonstrate that ReCalX consistently reduces perturbation-specific miscalibration most effectively while enhancing explanation robustness and the identification of globally important input features.

Paper Structure

This paper contains 18 sections, 4 theorems, 12 equations, 5 figures, 3 tables.

Key Result

Theorem 3.2

Let $\mathcal{L}_{\textit{CE}}$ be the cross-entropy loss, $D_{\textit{KL}}(\cdot,\cdot)$ be the KL-Divergence between two distributions, and let $I(\cdot,\cdot)$ denote the mutual information between random variables. Then we have:

Figures (5)

  • Figure 1: Perturbation-based explanation methods typically query the model on modified inputs and aggregate the resulting prediction changes to identify relevant features. However, we show that models typically produce significantly miscalibrated output probabilities under commonly used perturbations. This means that the underlying predictions used to derive explanations do not reflect actual changes in class likelihoods, obscuring true feature importance. To mitigate this, we propose ReCalX as a simple recalibration technique that enables reliable outputs under explainability-specific perturbations, leading to more informative explanation results.
  • Figure 2: Normalized calibration errors aggregated across 10 tabular datasets for an MLP (left) and a ResNet model (right), including 95% confidence intervals. For both models, the miscalibration under the mean replacement perturbation tends to increase uniformly with higher perturbation levels.
  • Figure 3: Calibration error results for popular image classifiers on ImageNet under fixed baseline perturbation with zeros. Across all methods, miscalibration varies significantly across the perturbation severity. While also for most image models the error tends to grow with perturbation level, for a ResNet50, miscalibration is worst for lower levels. This flexible behavior highlights the importance of calibration strategies that are adaptive to the perturbation strength.
  • Figure 4: Retraining results on four tabular datasets for an MLP (top row) and a ResNet (bottom row) when the features are removed based on their global importance estimated via Shapley Values. Whenever calibrated explanations imply a different importance ranking (green area), the resulting performance loss is consistently higher compared to the uncalibrated importance indications. Hence, ReCalX enables better identification of truly relevant features that are crucial for good performance.
  • Figure 5: Qualitative comparison of Shapley Value explanations before and after ReCalX calibration for a DenseNet121 (top row) and a SigLIP zero-shot model (bottom row). The calibrated explanations exhibit stronger focus on discriminative object regions and reduced noise, demonstrating how miscalibration correction leads to more informative and robust feature attributions.

Theorems & Definitions (6)

  • Definition 3.1
  • Theorem 3.2
  • Corollary 3.3
  • Theorem 3.4
  • Definition 4.1
  • Proposition 4.2