Localising Shortcut Learning in Pixel Space via Ordinal Scoring Correlations for Attribution Representations (OSCAR)
Akshit Achara, Peter Triantafillou, Esther Puyol-Antón, Alexander Hammers, Andrew P. King
TL;DR
OSCAR presents a pixel-space auditing framework that converts image-level attribution maps into dataset-level region rankings for three model variants (balanced baseline, test, and sensitive-attribute predictor). By defining pairwise, partial, and deviation-based correlations over aggregated region rankings and generating Region Contribution Scores, it localises shortcut features and assesses their dependence on sensitive attributes. The approach is validated across natural and medical imaging datasets, demonstrating stability across seeds, sensitivity to attribute associations, and the ability to distinguish localised versus diffuse shortcuts; it also shows how test-time attenuation using RCS-derived maps can mitigate biases. OSCAR thus offers a practical, model-agnostic, post-hoc tool for auditing, localising, and mitigating shortcut learning directly in pixel space.
Abstract
Deep neural networks often exploit shortcuts. These are spurious cues which are associated with output labels in the training data but are unrelated to task semantics. When the shortcut features are associated with sensitive attributes, shortcut learning can lead to biased model performance. Existing methods for localising and understanding shortcut learning are mostly based upon qualitative, image-level inspection and assume cues are human-visible, limiting their use in domains such as medical imaging. We introduce OSCAR (Ordinal Scoring Correlations for Attribution Representations), a model-agnostic framework for quantifying shortcut learning and localising shortcut features. OSCAR converts image-level task attribution maps into dataset-level rank profiles of image regions and compares them across three models: a balanced baseline model (BA), a test model (TS), and a sensitive attribute predictor (SA). By computing pairwise, partial, and deviation-based correlations on these rank profiles, we produce a set of quantitative metrics that characterise the degree of shortcut reliance for TS, together with a ranking of image-level regions that contribute most to it. Experiments on CelebA, CheXpert, and ADNI show that our correlations are (i) stable across seeds and partitions, (ii) sensitive to the level of association between shortcut features and output labels in the training data, and (iii) able to distinguish localised from diffuse shortcut features. As an illustration of the utility of our method, we show how worst-group performance disparities can be reduced using a simple test-time attenuation approach based on the identified shortcut regions. OSCAR provides a lightweight, pixel-space audit that yields statistical decision rules and spatial maps, enabling users to test, localise, and mitigate shortcut reliance. The code is available at https://github.com/acharaakshit/oscar
