Table of Contents
Fetching ...

Exposing Image Classifier Shortcuts with Counterfactual Frequency (CoF) Tables

James Hinns, David Martens

TL;DR

The paper tackles the problem of models exploiting shortcuts that harm generalization in image classification. It presents Semantic Counterfactuals for Accurate Picture (SCAP) explanations to label semantically meaningful image segments and Counterfactual Frequency (CoF) tables to aggregate these explanations across datasets, revealing global shortcut patterns. Key contributions include the SCAP framework, the CoF table construction formula $CoF(l)=\sum_{x=1}^{n}\sum_{j=1}^{m_x} \mathbf{1}(label(S_x^j)=l \land g(i_x,S_x^j) \neq c_x)$, and demonstrations on datasets such as Colour MNIST, BAR, and ImageNet variants, exposing shortcuts like watermark cues and background biases. The findings demonstrate how CoF tables provide a scalable, interpretable means to diagnose and mitigate shortcut reliance, with practical implications for robust model evaluation and data curation. This approach supports enhanced model cards and targeted dataset interventions to improve generalization in real-world deployments.

Abstract

The rise of deep learning in image classification has brought unprecedented accuracy but also highlighted a key issue: the use of 'shortcuts' by models. Such shortcuts are easy-to-learn patterns from the training data that fail to generalise to new data. Examples include the use of a copyright watermark to recognise horses, snowy background to recognise huskies, or ink markings to detect malignant skin lesions. The explainable AI (XAI) community has suggested using instance-level explanations to detect shortcuts without external data, but this requires the examination of many explanations to confirm the presence of such shortcuts, making it a labour-intensive process. To address these challenges, we introduce Counterfactual Frequency (CoF) tables, a novel approach that aggregates instance-based explanations into global insights, and exposes shortcuts. The aggregation implies the need for some semantic concepts to be used in the explanations, which we solve by labelling the segments of an image. We demonstrate the utility of CoF tables across several datasets, revealing the shortcuts learned from them.

Exposing Image Classifier Shortcuts with Counterfactual Frequency (CoF) Tables

TL;DR

The paper tackles the problem of models exploiting shortcuts that harm generalization in image classification. It presents Semantic Counterfactuals for Accurate Picture (SCAP) explanations to label semantically meaningful image segments and Counterfactual Frequency (CoF) tables to aggregate these explanations across datasets, revealing global shortcut patterns. Key contributions include the SCAP framework, the CoF table construction formula , and demonstrations on datasets such as Colour MNIST, BAR, and ImageNet variants, exposing shortcuts like watermark cues and background biases. The findings demonstrate how CoF tables provide a scalable, interpretable means to diagnose and mitigate shortcut reliance, with practical implications for robust model evaluation and data curation. This approach supports enhanced model cards and targeted dataset interventions to improve generalization in real-world deployments.

Abstract

The rise of deep learning in image classification has brought unprecedented accuracy but also highlighted a key issue: the use of 'shortcuts' by models. Such shortcuts are easy-to-learn patterns from the training data that fail to generalise to new data. Examples include the use of a copyright watermark to recognise horses, snowy background to recognise huskies, or ink markings to detect malignant skin lesions. The explainable AI (XAI) community has suggested using instance-level explanations to detect shortcuts without external data, but this requires the examination of many explanations to confirm the presence of such shortcuts, making it a labour-intensive process. To address these challenges, we introduce Counterfactual Frequency (CoF) tables, a novel approach that aggregates instance-based explanations into global insights, and exposes shortcuts. The aggregation implies the need for some semantic concepts to be used in the explanations, which we solve by labelling the segments of an image. We demonstrate the utility of CoF tables across several datasets, revealing the shortcuts learned from them.
Paper Structure (7 sections, 2 equations, 9 figures, 9 tables)

This paper contains 7 sections, 2 equations, 9 figures, 9 tables.

Figures (9)

  • Figure 1: Component Diagram illustrating the process to produce a SCAP explanation, and how they are aggregated into CoF tables.
  • Figure 2: In the first row, example images from our biased MNIST dataset, with the their classifications. In the second row, their counterfactual counterparts, with the new classification.
  • Figure 3: Individual SCAP explanation from the biased MNIST dataset.
  • Figure 4: Individual SCAP explanation from the BAR dataset.
  • Figure 5: Individual SCAP explanation for rowing boat class of ImageNet.
  • ...and 4 more figures