Do ImageNet-trained models learn shortcuts? The impact of frequency shortcuts on generalization
Shunxin Wang, Raymond Veldhuis, Nicola Strisciuglio
TL;DR
This work addresses whether ImageNet-trained models rely on frequency shortcuts by introducing HFSS, a hierarchical Fourier-domain search that efficiently uncovers joint frequency subsets responsible for classification. It quantifies shortcut reliance with Dominant Frequency Maps and class-wise $TPR$, then links shortcut learning to generalization across in-distribution and diverse out-of-distribution settings, including texture-rich and rendition-based data, as well as adversarial perturbations. The key contributions are the HFSS method, evidence that both CNNs and transformers learn frequency shortcuts on ImageNet, and a nuanced view showing that shortcuts can boost robustness or generalization in texture-preserving OOD contexts but hurt rendition-based generalization, pointing to gaps in current OOD benchmarks. This work provides a practical framework to evaluate and account for frequency shortcuts in model generalization, encouraging benchmarks that explicitly consider such shortcuts for more robust and transferable learning.
Abstract
Frequency shortcuts refer to specific frequency patterns that models heavily rely on for correct classification. Previous studies have shown that models trained on small image datasets often exploit such shortcuts, potentially impairing their generalization performance. However, existing methods for identifying frequency shortcuts require expensive computations and become impractical for analyzing models trained on large datasets. In this work, we propose the first approach to more efficiently analyze frequency shortcuts at a large scale. We show that both CNN and transformer models learn frequency shortcuts on ImageNet. We also expose that frequency shortcut solutions can yield good performance on out-of-distribution (OOD) test sets which largely retain texture information. However, these shortcuts, mostly aligned with texture patterns, hinder model generalization on rendition-based OOD test sets. These observations suggest that current OOD evaluations often overlook the impact of frequency shortcuts on model generalization. Future benchmarks could thus benefit from explicitly assessing and accounting for these shortcuts to build models that generalize across a broader range of OOD scenarios.
