A Survey on the Robustness of Computer Vision Models against Common Corruptions

Shunxin Wang; Raymond Veldhuis; Christoph Brune; Nicola Strisciuglio

A Survey on the Robustness of Computer Vision Models against Common Corruptions

Shunxin Wang, Raymond Veldhuis, Christoph Brune, Nicola Strisciuglio

TL;DR

This survey presents a comprehensive overview of methods that improve the robustness of computer vision models against common corruptions, and categorizes methods into three groups based on the model components and training methods they target: data augmentation, learning strategies, and network components.

Abstract

The performance of computer vision models are susceptible to unexpected changes in input images caused by sensor errors or extreme imaging environments, known as common corruptions (e.g. noise, blur, illumination changes). These corruptions can significantly hinder the reliability of these models when deployed in real-world scenarios, yet they are often overlooked when testing model generalization and robustness. In this survey, we present a comprehensive overview of methods that improve the robustness of computer vision models against common corruptions. We categorize methods into three groups based on the model components and training methods they target: data augmentation, learning strategies, and network components. We release a unified benchmark framework (available at \url{https://github.com/nis-research/CorruptionBenchCV}) to compare robustness performance across several datasets, and we address the inconsistencies of evaluation practices in the literature. Our experimental analysis highlights the base corruption robustness of popular vision backbones, revealing that corruption robustness does not necessarily scale with model size and data size. Large models gain negligible robustness improvements, considering the increased computational requirements. To achieve generalizable and robust computer vision models, we foresee the need of developing new learning strategies that efficiently exploit limited data and mitigate unreliable learning behaviors.

A Survey on the Robustness of Computer Vision Models against Common Corruptions

TL;DR

Abstract

Paper Structure (22 sections, 9 equations, 9 figures, 6 tables)

This paper contains 22 sections, 9 equations, 9 figures, 6 tables.

Introduction
Background and motivation
Comparison with other surveys
Contributions
Related topics
Datasets and metrics
Benchmark datasets for corruption robustness
Evaluation metrics
Mean corruption and relative corruption error
Mean flip rate
Mean top-5 distance
Expected calibration error
Methods for improving corruption robustness
Data Augmentation
Learning strategies
...and 7 more sections

Figures (9)

Figure 1: The number of related publications to robustness to common image corruptions and the corresponding citation numbers from the year 2012 to 2023 (searched with keywords: 'image corruption', 'corruption robustness', 'robustness to corruption', excluding 'label noise' and 'noisy labels', and generated from https://www.webofscience.com/wos/woscc/summary/dd9274d2-a790-4ef0-85e7-0405b1f4152c-4ed603b7/date-descending/1).
Figure 2: (a) An image from ImageNet and its corrupted versions from (b) ImageNet-$\mathrm{C}$, (c) ImageNet-$\mathrm{\bar{C}}$ (perceptually different from ImageNet-$\mathrm{C}$), and (d) ImageNet-$\mathrm{3DCC}$ (using 3D geometric information to improve realness).
Figure 3: Taxonomy of methods improving corruption robustness.
Figure 4: Self-supervised chen2020simple and supervised contrastive learning khosla2021supervised. Without ground truth labels, self-supervised learning may result in learning representations of images from the same class far away from each other in the latent space. For supervised learning, although the image representations of the same class are closer in the latent space, this might result in class collapse when images from the same class point to the same representation.
Figure 5: In disentangled learning, the representation of images is separated into two parts --- style and content codes. The content code represents the semantics while the style code represents appearance and corruption information.
...and 4 more figures

A Survey on the Robustness of Computer Vision Models against Common Corruptions

TL;DR

Abstract

A Survey on the Robustness of Computer Vision Models against Common Corruptions

Authors

TL;DR

Abstract

Table of Contents

Figures (9)