Table of Contents
Fetching ...

Feature Attribution Stability Suite: How Stable Are Post-Hoc Attributions?

Kamalasankari Subramaniakuppusamy, Jugal Gajjar

Abstract

Post-hoc feature attribution methods are widely deployed in safety-critical vision systems, yet their stability under realistic input perturbations remains poorly characterized. Existing metrics evaluate explanations primarily under additive noise, collapse stability to a single scalar, and fail to condition on prediction preservation, conflating explanation fragility with model sensitivity. We introduce the Feature Attribution Stability Suite (FASS), a benchmark that enforces prediction-invariance filtering, decomposes stability into three complementary metrics: structural similarity, rank correlation, and top-k Jaccard overlap-and evaluates across geometric, photometric, and compression perturbations. Evaluating four attribution methods (Integrated Gradients, GradientSHAP, Grad-CAM, LIME) across four architectures and three datasets-ImageNet-1K, MS COCO, and CIFAR-10, FASS shows that stability estimates depend critically on perturbation family and prediction-invariance filtering. Geometric perturbations expose substantially greater attribution instability than photometric changes, and without conditioning on prediction preservation, up to 99% of evaluated pairs involve changed predictions. Under this controlled evaluation, we observe consistent method-level trends, with Grad-CAM achieving the highest stability across datasets.

Feature Attribution Stability Suite: How Stable Are Post-Hoc Attributions?

Abstract

Post-hoc feature attribution methods are widely deployed in safety-critical vision systems, yet their stability under realistic input perturbations remains poorly characterized. Existing metrics evaluate explanations primarily under additive noise, collapse stability to a single scalar, and fail to condition on prediction preservation, conflating explanation fragility with model sensitivity. We introduce the Feature Attribution Stability Suite (FASS), a benchmark that enforces prediction-invariance filtering, decomposes stability into three complementary metrics: structural similarity, rank correlation, and top-k Jaccard overlap-and evaluates across geometric, photometric, and compression perturbations. Evaluating four attribution methods (Integrated Gradients, GradientSHAP, Grad-CAM, LIME) across four architectures and three datasets-ImageNet-1K, MS COCO, and CIFAR-10, FASS shows that stability estimates depend critically on perturbation family and prediction-invariance filtering. Geometric perturbations expose substantially greater attribution instability than photometric changes, and without conditioning on prediction preservation, up to 99% of evaluated pairs involve changed predictions. Under this controlled evaluation, we observe consistent method-level trends, with Grad-CAM achieving the highest stability across datasets.

Paper Structure

This paper contains 25 sections, 6 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: FASS evaluation pipeline. Each input image is paired with its perturbed counterpart. Only prediction-invariant pairs proceed to attribution computation; excluded pairs are reported as retention diagnostics. Stability is decomposed into spatial (SSIM), ordinal (Spearman), and salient-region (Jaccard) components, averaged into a composite FASS score.
  • Figure 2: Attribution stability barplots across all three datasets. Top: ImageNet (native resolution; highest overall stability). Middle: CIFAR-10 (distribution-mismatch stress test; $32\times32$ inputs upsampled $7\times$). Bottom: COCO (multi-object scenes; LIME narrows the gap with gradient-based methods). In all three settings Grad-CAM achieves the highest per-method FASS, and the method ranking Grad-CAM $>$ IG $>$ GradientSHAP $>$ LIME is preserved without rank crossings across all datasets.