Table of Contents
Fetching ...

FFT-based Selection and Optimization of Statistics for Robust Recognition of Severely Corrupted Images

Elena Camuffo, Umberto Michieli, Jijoong Moon, Daehyun Kim, Mete Ozay

TL;DR

This work tackles robust object recognition under severe image corruptions by introducing FROST, a test-time method that uses high-frequency FFT features to identify the corruption type and select normalization statistics accordingly. It constructs corruption prototypes from the first $n$ high-frequency FFT amplitudes, with $n=15$, across synthetic corruptions with intensity levels $\lambda \in \{1,2,3,4,5\}$, and maps these prototypes to corruption-generic or corruption-specific statistics via a codebook. At inference, it matches the input FFT signature to prototypes to choose $S^*$ and applies it to BN/LN layers, employing a confidence threshold $T$ to handle uncertainty. Empirically on ImageNet-C, FROST yields state-of-the-art mean corruption error reductions while preserving clean accuracy, with low memory overhead and broad applicability across architectures.

Abstract

Improving model robustness in case of corrupted images is among the key challenges to enable robust vision systems on smart devices, such as robotic agents. Particularly, robust test-time performance is imperative for most of the applications. This paper presents a novel approach to improve robustness of any classification model, especially on severely corrupted images. Our method (FROST) employs high-frequency features to detect input image corruption type, and select layer-wise feature normalization statistics. FROST provides the state-of-the-art results for different models and datasets, outperforming competitors on ImageNet-C by up to 37.1% relative gain, improving baseline of 40.9% mCE on severe corruptions.

FFT-based Selection and Optimization of Statistics for Robust Recognition of Severely Corrupted Images

TL;DR

This work tackles robust object recognition under severe image corruptions by introducing FROST, a test-time method that uses high-frequency FFT features to identify the corruption type and select normalization statistics accordingly. It constructs corruption prototypes from the first high-frequency FFT amplitudes, with , across synthetic corruptions with intensity levels , and maps these prototypes to corruption-generic or corruption-specific statistics via a codebook. At inference, it matches the input FFT signature to prototypes to choose and applies it to BN/LN layers, employing a confidence threshold to handle uncertainty. Empirically on ImageNet-C, FROST yields state-of-the-art mean corruption error reductions while preserving clean accuracy, with low memory overhead and broad applicability across architectures.

Abstract

Improving model robustness in case of corrupted images is among the key challenges to enable robust vision systems on smart devices, such as robotic agents. Particularly, robust test-time performance is imperative for most of the applications. This paper presents a novel approach to improve robustness of any classification model, especially on severely corrupted images. Our method (FROST) employs high-frequency features to detect input image corruption type, and select layer-wise feature normalization statistics. FROST provides the state-of-the-art results for different models and datasets, outperforming competitors on ImageNet-C by up to 37.1% relative gain, improving baseline of 40.9% mCE on severe corruptions.
Paper Structure (4 sections, 3 figures, 4 tables)

This paper contains 4 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Overall pipeline of our FROST. At training time, we construct (i) corruption-specific prototypes using high-frequency FFT features and (ii) corruption-specific feature normalization statistics. At test time, we extract FFT features and perform inference via prototype matching to select the most suitable statistics.
  • Figure : $\ $ Original k-means
  • Figure : $\ \ \ $ Mean Variance