Table of Contents
Fetching ...

Missingness Bias Calibration in Feature Attribution Explanations

Shailesh Sridhar, Anton Xue, Eric Wong

TL;DR

MCal is introduced, a lightweight post-hoc method that corrects missingness bias by fine-tuning a simple linear head on the outputs of a frozen base model and is competitive with, or even outperforms, prior heavyweight approaches across diverse medical benchmarks spanning vision, language, and tabular domains.

Abstract

Popular explanation methods often produce unreliable feature importance scores due to missingness bias, a systematic distortion that arises when models are probed with ablated, out-of-distribution inputs. Existing solutions treat this as a deep representational flaw that requires expensive retraining or architectural modifications. In this work, we challenge this assumption and show that missingness bias can be effectively treated as a superficial artifact of the model's output space. We introduce MCal, a lightweight post-hoc method that corrects this bias by fine-tuning a simple linear head on the outputs of a frozen base model. Surprisingly, we find this simple correction consistently reduces missingness bias and is competitive with, or even outperforms, prior heavyweight approaches across diverse medical benchmarks spanning vision, language, and tabular domains.

Missingness Bias Calibration in Feature Attribution Explanations

TL;DR

MCal is introduced, a lightweight post-hoc method that corrects missingness bias by fine-tuning a simple linear head on the outputs of a frozen base model and is competitive with, or even outperforms, prior heavyweight approaches across diverse medical benchmarks spanning vision, language, and tabular domains.

Abstract

Popular explanation methods often produce unreliable feature importance scores due to missingness bias, a systematic distortion that arises when models are probed with ablated, out-of-distribution inputs. Existing solutions treat this as a deep representational flaw that requires expensive retraining or architectural modifications. In this work, we challenge this assumption and show that missingness bias can be effectively treated as a superficial artifact of the model's output space. We introduce MCal, a lightweight post-hoc method that corrects this bias by fine-tuning a simple linear head on the outputs of a frozen base model. Surprisingly, we find this simple correction consistently reduces missingness bias and is competitive with, or even outperforms, prior heavyweight approaches across diverse medical benchmarks spanning vision, language, and tabular domains.
Paper Structure (47 sections, 1 theorem, 5 equations, 12 figures, 1 table)

This paper contains 47 sections, 1 theorem, 5 equations, 12 figures, 1 table.

Key Result

Theorem 3.1

The MCal objective $\mathcal{L}(\theta)$ is convex in $\theta$.

Figures (12)

  • Figure 1: Removing irrelevant features can cause a misdiagnosis. A fine-tuned ViT dosovitskiy2020image correctly predicts "tumor" on the clean image (left) and a subset of the relevant features (middle). However, masking irrelevant features flips the prediction to "healthy", despite the tumor remaining visible (right). For visualization, gray stripes denote zero-valued pixels, and images are contrast-boosted.
  • Figure 2: Feature ablations induce class distribution shifts. Masking non-critical regions skews predictions towards the "healthy" class, even when tumors remain visible. This effect, known as missingness bias, causes the model to misclassify inputs that retain relevant features, and undermines the reliability of feature attribution explanations.
  • Figure 3: MCal corrects class distribution shifts induced by input ablations. The model initially predicts "healthy" from the ablated input. MCal applies a learned transformation $R_\theta$ to adjust the output probabilities, thereby restoring alignment with expected class distributions. This calibration method is model-agnostic, requiring only the classifier's output probabilities of each class.
  • Figure 4: Geometric intuition of MCal on a synthetic dataset. Missingness bias causes the uncalibrated outputs to shift. For instance, the Class A cluster (blue circles) is pulled towards the Class B vertex, leading to systematic misclassification and low accuracy. MCal applies an optimal affine transformation to the uncalibrated outputs, correcting the shift and improving accuracy.
  • Figure 5: Calibrated models have better explanations. Compared to an uncalibrated baseline model (Base), LIME and SHAP explanations on MCal-calibrated models have more accurate feature importance scores (sufficiency $\downarrow$). In addition, calibrated models are also more robust to feature ablations (sensitivity $\downarrow$). Results are shown for the MRI dataset using an unconditioned calibrator.
  • ...and 7 more figures

Theorems & Definitions (2)

  • Theorem 3.1: Guaranteed Optimal Convergence
  • proof