Table of Contents
Fetching ...

Do ImageNet-trained models learn shortcuts? The impact of frequency shortcuts on generalization

Shunxin Wang, Raymond Veldhuis, Nicola Strisciuglio

TL;DR

This work addresses whether ImageNet-trained models rely on frequency shortcuts by introducing HFSS, a hierarchical Fourier-domain search that efficiently uncovers joint frequency subsets responsible for classification. It quantifies shortcut reliance with Dominant Frequency Maps and class-wise $TPR$, then links shortcut learning to generalization across in-distribution and diverse out-of-distribution settings, including texture-rich and rendition-based data, as well as adversarial perturbations. The key contributions are the HFSS method, evidence that both CNNs and transformers learn frequency shortcuts on ImageNet, and a nuanced view showing that shortcuts can boost robustness or generalization in texture-preserving OOD contexts but hurt rendition-based generalization, pointing to gaps in current OOD benchmarks. This work provides a practical framework to evaluate and account for frequency shortcuts in model generalization, encouraging benchmarks that explicitly consider such shortcuts for more robust and transferable learning.

Abstract

Frequency shortcuts refer to specific frequency patterns that models heavily rely on for correct classification. Previous studies have shown that models trained on small image datasets often exploit such shortcuts, potentially impairing their generalization performance. However, existing methods for identifying frequency shortcuts require expensive computations and become impractical for analyzing models trained on large datasets. In this work, we propose the first approach to more efficiently analyze frequency shortcuts at a large scale. We show that both CNN and transformer models learn frequency shortcuts on ImageNet. We also expose that frequency shortcut solutions can yield good performance on out-of-distribution (OOD) test sets which largely retain texture information. However, these shortcuts, mostly aligned with texture patterns, hinder model generalization on rendition-based OOD test sets. These observations suggest that current OOD evaluations often overlook the impact of frequency shortcuts on model generalization. Future benchmarks could thus benefit from explicitly assessing and accounting for these shortcuts to build models that generalize across a broader range of OOD scenarios.

Do ImageNet-trained models learn shortcuts? The impact of frequency shortcuts on generalization

TL;DR

This work addresses whether ImageNet-trained models rely on frequency shortcuts by introducing HFSS, a hierarchical Fourier-domain search that efficiently uncovers joint frequency subsets responsible for classification. It quantifies shortcut reliance with Dominant Frequency Maps and class-wise , then links shortcut learning to generalization across in-distribution and diverse out-of-distribution settings, including texture-rich and rendition-based data, as well as adversarial perturbations. The key contributions are the HFSS method, evidence that both CNNs and transformers learn frequency shortcuts on ImageNet, and a nuanced view showing that shortcuts can boost robustness or generalization in texture-preserving OOD contexts but hurt rendition-based generalization, pointing to gaps in current OOD benchmarks. This work provides a practical framework to evaluate and account for frequency shortcuts in model generalization, encouraging benchmarks that explicitly consider such shortcuts for more robust and transferable learning.

Abstract

Frequency shortcuts refer to specific frequency patterns that models heavily rely on for correct classification. Previous studies have shown that models trained on small image datasets often exploit such shortcuts, potentially impairing their generalization performance. However, existing methods for identifying frequency shortcuts require expensive computations and become impractical for analyzing models trained on large datasets. In this work, we propose the first approach to more efficiently analyze frequency shortcuts at a large scale. We show that both CNN and transformer models learn frequency shortcuts on ImageNet. We also expose that frequency shortcut solutions can yield good performance on out-of-distribution (OOD) test sets which largely retain texture information. However, these shortcuts, mostly aligned with texture patterns, hinder model generalization on rendition-based OOD test sets. These observations suggest that current OOD evaluations often overlook the impact of frequency shortcuts on model generalization. Future benchmarks could thus benefit from explicitly assessing and accounting for these shortcuts to build models that generalize across a broader range of OOD scenarios.

Paper Structure

This paper contains 47 sections, 16 figures, 9 tables.

Figures (16)

  • Figure 1: Scheme of HFSS. Starting from stage 2, we sample frequency patches from a random frequency subset searched in previous stage. This confines the size of search space. The white patches in the binary masks indicate sampled frequency patches.
  • Figure 2: The frequency spectrum is separated into patches, with $p\%$ sampled for shortcut evaluation.
  • Figure 3: ResNet18 tested on original test images and DFM-filtered images of ImageNet-1k. Blue line shows the average TPR of classes subject to shortcuts on the original test images, and the orange line shows the results of non-shortcut classes, at different threshold $t$. The green and red lines correspond to results tested on DFM-filtered images. Lower $t$ indicates weak shortcuts and higher $t$ signifies stronger ones. The size of each point reflects the number of classes. The larger the size, the more classes included.
  • Figure 4: The best class-wise loss vs. the number of sampled frequency subsets at each stage.
  • Figure 5: Average TPR vs. search time, where search time increases proportionally with the number of sampled frequency subsets.
  • ...and 11 more figures