Table of Contents
Fetching ...

Can Biases in ImageNet Models Explain Generalization?

Paul Gavrikov, Janis Keuper

TL;DR

This study demonstrates that no single bias fully accounts for generalization across ImageNet models when the architecture is fixed to ResNet-50. By evaluating 48 models trained with diverse methods and measuring texture/shape bias, spectral bias, and the critical band, the authors assess correlations against broad generalization benchmarks spanning in-distribution, robustness, conceptual changes, and adversarial robustness. The results reveal many outliers and non-monotonic relationships; while some biases (notably high-frequency cues) can modestly relate to generalization, others (like shape bias or center frequency of the critical band) fail to predict holistic performance. The work underscores the complexity of generalization, cautions against optimizing for a single bias, and provides rigorous, standardized benchmarking and data access to spur further investigation into robust, transferable representations.

Abstract

The robust generalization of models to rare, in-distribution (ID) samples drawn from the long tail of the training distribution and to out-of-training-distribution (OOD) samples is one of the major challenges of current deep learning methods. For image classification, this manifests in the existence of adversarial attacks, the performance drops on distorted images, and a lack of generalization to concepts such as sketches. The current understanding of generalization in neural networks is very limited, but some biases that differentiate models from human vision have been identified and might be causing these limitations. Consequently, several attempts with varying success have been made to reduce these biases during training to improve generalization. We take a step back and sanity-check these attempts. Fixing the architecture to the well-established ResNet-50, we perform a large-scale study on 48 ImageNet models obtained via different training methods to understand how and if these biases - including shape bias, spectral biases, and critical bands - interact with generalization. Our extensive study results reveal that contrary to previous findings, these biases are insufficient to accurately predict the generalization of a model holistically. We provide access to all checkpoints and evaluation code at https://github.com/paulgavrikov/biases_vs_generalization

Can Biases in ImageNet Models Explain Generalization?

TL;DR

This study demonstrates that no single bias fully accounts for generalization across ImageNet models when the architecture is fixed to ResNet-50. By evaluating 48 models trained with diverse methods and measuring texture/shape bias, spectral bias, and the critical band, the authors assess correlations against broad generalization benchmarks spanning in-distribution, robustness, conceptual changes, and adversarial robustness. The results reveal many outliers and non-monotonic relationships; while some biases (notably high-frequency cues) can modestly relate to generalization, others (like shape bias or center frequency of the critical band) fail to predict holistic performance. The work underscores the complexity of generalization, cautions against optimizing for a single bias, and provides rigorous, standardized benchmarking and data access to spur further investigation into robust, transferable representations.

Abstract

The robust generalization of models to rare, in-distribution (ID) samples drawn from the long tail of the training distribution and to out-of-training-distribution (OOD) samples is one of the major challenges of current deep learning methods. For image classification, this manifests in the existence of adversarial attacks, the performance drops on distorted images, and a lack of generalization to concepts such as sketches. The current understanding of generalization in neural networks is very limited, but some biases that differentiate models from human vision have been identified and might be causing these limitations. Consequently, several attempts with varying success have been made to reduce these biases during training to improve generalization. We take a step back and sanity-check these attempts. Fixing the architecture to the well-established ResNet-50, we perform a large-scale study on 48 ImageNet models obtained via different training methods to understand how and if these biases - including shape bias, spectral biases, and critical bands - interact with generalization. Our extensive study results reveal that contrary to previous findings, these biases are insufficient to accurately predict the generalization of a model holistically. We provide access to all checkpoints and evaluation code at https://github.com/paulgavrikov/biases_vs_generalization
Paper Structure (50 sections, 1 equation, 16 figures, 2 tables)

This paper contains 50 sections, 1 equation, 16 figures, 2 tables.

Figures (16)

  • Figure 1: We study the influence of three selected biases that separate models from humans on the generalization of ImageNet models. Our study suggests that no single bias correlates with generalization in a holistic sense. We measure the texture/shape bias geirhos2018imagenettrained, critical band subramanian2023spatialfrequency, and low/high-frequency spectral biases Wang_2020_CVPR on 48 models and correlate these biases against generalization that we measure on several benchmarks belonging to four categories: in distribution, robustness, conceptual changes, and adversarial robustness.
  • Figure 2: Biases often only correlate with specific aspects of generalization or model groups. We measure Spearman $r$ correlations on all models (Total) and separately on adversarially-trained (AT), and all other models, as there is often a different trend. Non-significant correlations with $p\geq 0.05$ are set to 0. Please note that $r$ does not capture non-monotonic relations.
  • Figure 3: Many benchmarks show notable positive correlations between each other - except SIN and PGD (adversarial attack). Correlations measured by Spearman $r$. We set non-significant correlations with $p\geq 0.05$ to 0.
  • Figure 4: Shape Bias vs. Generalization. A value of 0 indicates a texture bias, and 1 is a shape bias.
  • Figure 5: Spectral Biases vs. Generalization. We correlate low-frequency (top) and high-frequency bias (bottom).
  • ...and 11 more figures