Table of Contents
Fetching ...

Localising Shortcut Learning in Pixel Space via Ordinal Scoring Correlations for Attribution Representations (OSCAR)

Akshit Achara, Peter Triantafillou, Esther Puyol-Antón, Alexander Hammers, Andrew P. King

TL;DR

OSCAR presents a pixel-space auditing framework that converts image-level attribution maps into dataset-level region rankings for three model variants (balanced baseline, test, and sensitive-attribute predictor). By defining pairwise, partial, and deviation-based correlations over aggregated region rankings and generating Region Contribution Scores, it localises shortcut features and assesses their dependence on sensitive attributes. The approach is validated across natural and medical imaging datasets, demonstrating stability across seeds, sensitivity to attribute associations, and the ability to distinguish localised versus diffuse shortcuts; it also shows how test-time attenuation using RCS-derived maps can mitigate biases. OSCAR thus offers a practical, model-agnostic, post-hoc tool for auditing, localising, and mitigating shortcut learning directly in pixel space.

Abstract

Deep neural networks often exploit shortcuts. These are spurious cues which are associated with output labels in the training data but are unrelated to task semantics. When the shortcut features are associated with sensitive attributes, shortcut learning can lead to biased model performance. Existing methods for localising and understanding shortcut learning are mostly based upon qualitative, image-level inspection and assume cues are human-visible, limiting their use in domains such as medical imaging. We introduce OSCAR (Ordinal Scoring Correlations for Attribution Representations), a model-agnostic framework for quantifying shortcut learning and localising shortcut features. OSCAR converts image-level task attribution maps into dataset-level rank profiles of image regions and compares them across three models: a balanced baseline model (BA), a test model (TS), and a sensitive attribute predictor (SA). By computing pairwise, partial, and deviation-based correlations on these rank profiles, we produce a set of quantitative metrics that characterise the degree of shortcut reliance for TS, together with a ranking of image-level regions that contribute most to it. Experiments on CelebA, CheXpert, and ADNI show that our correlations are (i) stable across seeds and partitions, (ii) sensitive to the level of association between shortcut features and output labels in the training data, and (iii) able to distinguish localised from diffuse shortcut features. As an illustration of the utility of our method, we show how worst-group performance disparities can be reduced using a simple test-time attenuation approach based on the identified shortcut regions. OSCAR provides a lightweight, pixel-space audit that yields statistical decision rules and spatial maps, enabling users to test, localise, and mitigate shortcut reliance. The code is available at https://github.com/acharaakshit/oscar

Localising Shortcut Learning in Pixel Space via Ordinal Scoring Correlations for Attribution Representations (OSCAR)

TL;DR

OSCAR presents a pixel-space auditing framework that converts image-level attribution maps into dataset-level region rankings for three model variants (balanced baseline, test, and sensitive-attribute predictor). By defining pairwise, partial, and deviation-based correlations over aggregated region rankings and generating Region Contribution Scores, it localises shortcut features and assesses their dependence on sensitive attributes. The approach is validated across natural and medical imaging datasets, demonstrating stability across seeds, sensitivity to attribute associations, and the ability to distinguish localised versus diffuse shortcuts; it also shows how test-time attenuation using RCS-derived maps can mitigate biases. OSCAR thus offers a practical, model-agnostic, post-hoc tool for auditing, localising, and mitigating shortcut learning directly in pixel space.

Abstract

Deep neural networks often exploit shortcuts. These are spurious cues which are associated with output labels in the training data but are unrelated to task semantics. When the shortcut features are associated with sensitive attributes, shortcut learning can lead to biased model performance. Existing methods for localising and understanding shortcut learning are mostly based upon qualitative, image-level inspection and assume cues are human-visible, limiting their use in domains such as medical imaging. We introduce OSCAR (Ordinal Scoring Correlations for Attribution Representations), a model-agnostic framework for quantifying shortcut learning and localising shortcut features. OSCAR converts image-level task attribution maps into dataset-level rank profiles of image regions and compares them across three models: a balanced baseline model (BA), a test model (TS), and a sensitive attribute predictor (SA). By computing pairwise, partial, and deviation-based correlations on these rank profiles, we produce a set of quantitative metrics that characterise the degree of shortcut reliance for TS, together with a ranking of image-level regions that contribute most to it. Experiments on CelebA, CheXpert, and ADNI show that our correlations are (i) stable across seeds and partitions, (ii) sensitive to the level of association between shortcut features and output labels in the training data, and (iii) able to distinguish localised from diffuse shortcut features. As an illustration of the utility of our method, we show how worst-group performance disparities can be reduced using a simple test-time attenuation approach based on the identified shortcut regions. OSCAR provides a lightweight, pixel-space audit that yields statistical decision rules and spatial maps, enabling users to test, localise, and mitigate shortcut reliance. The code is available at https://github.com/acharaakshit/oscar

Paper Structure

This paper contains 59 sections, 2 theorems, 15 equations, 16 figures, 6 tables, 2 algorithms.

Key Result

Proposition 1

Let $g^M:\mathbb R\to\mathbb R$ be strictly increasing for each model $M$, and define transformed scores $\tilde{s}_{i,r}^M = g^M(s_{i,r}^M)$. Let $\widetilde{\mathcal{R}}^{M}$ be the OSCAR profile constructed from $\tilde{s}_{i,r}^M$. Then, for every $M$, $\widetilde{\mathcal{R}}^{M} = \mathcal{R}^

Figures (16)

  • Figure 1: Overview of the OSCAR framework. (i) From the input data, three different models are trained: TS, SA and BA (Section \ref{['subsection:methodstraining']}). (ii) These models are used to obtain attribution maps for each test set sample (Section \ref{['subsection:interpretabmethod']}). (iii) The attribution maps are partitioned into disjoint regions and the attribution values summarised at the region level to form a ranked set for each image (Section \ref{['subsection:partitions']}). (iv) The ranked sets are aggregated across the test set to form dataset level rank profiles $\mathcal{R}$ for each of TS, SA and BA (Section \ref{['subsection:rankanalysis']}). (v) Correlations are computed using the rank profiles and shortcut hypotheses tested (Section \ref{['subsection:hypotheses']}). (vi) A corresponding region contribution score (RCS) map is obtained, which indicates the region(s) that contributed to the shortcut learning (Section \ref{['subsection:rcs']}).
  • Figure 2: Worst group accuracy for classification of target labels. RN50 stands for ResNet50 and INV3 for InceptionV3.
  • Figure 3: Worst group accuracy over varying number of discordant pairs for CelebA, CheXpert and ADNI datasets.
  • Figure 4: Partial $\rho_{\mathcal{R}^{TS}\mathcal{R}^{SA}.\mathcal{R}^{BA}}$, deviation-based $\rho_{dev\ \mathcal{R}^{TS}\mathcal{R}^{SA}.\mathcal{R}^{BA}}$ and pairwise $\rho(\mathcal{R}^{TS}, \mathcal{R}^{SA})$ correlations over varying numbers of discordant pairs for CelebA, CheXpert and ADNI datasets using Grad-CAM attributions and $16\times 16$ grid partitioning. RN50 and INV3 stand for ResNet50 and InceptionV3 respectively.
  • Figure 5: Region contribution scores (RCS) across increasing number of discordant pairs. Each row shows, from left-to-right, $25\ \text{samples} \to 10\% \to 20\% \to 30\% \to 40\%$ discordant pairs. All attributions were produced using Grad-CAM based on the grid-based $16\times16$ partitioning approach.
  • ...and 11 more figures

Theorems & Definitions (5)

  • Definition 1: Partial correlation on aggregated ranks
  • Proposition 1: Monotone Invariance
  • proof
  • Proposition 2: RCS as a Per-Region Decomposition of Partial Correlation
  • proof