Assessing Visually-Continuous Corruption Robustness of Neural Networks Relative to Human Performance

Huakun Shen; Boyue Caroline Hu; Krzysztof Czarnecki; Lina Marsso; Marsha Chechik

Assessing Visually-Continuous Corruption Robustness of Neural Networks Relative to Human Performance

Huakun Shen, Boyue Caroline Hu, Krzysztof Czarnecki, Lina Marsso, Marsha Chechik

TL;DR

This work introduces visually-continuous corruption robustness (VCR) to evaluate neural networks against a continuous spectrum of perceptual image degradations, aligning robustness assessment with human vision. It defines a perceptual visual-change metric via $\Delta_v$ and instantiates two properties, accuracy and prediction consistency, yielding $\mathcal{R}_a$ and $\mathcal{R}_p$. The authors propose HMRI and MRSI to compare NN VCR against human performance and validate the approach with 14 corruptions, thousands of human participants, and diverse NN architectures, including vision transformers. Key findings show a larger robustness gap between NN and human perception than previously reported, particularly for blur, and demonstrate that data augmentation guided by VCR can reduce this gap. The work provides an open benchmark and toolbox to measure and improve NN robustness in a human-centric, continuous framework.

Abstract

While Neural Networks (NNs) have surpassed human accuracy in image classification on ImageNet, they often lack robustness against image corruption, i.e., corruption robustness. Yet such robustness is seemingly effortless for human perception. In this paper, we propose visually-continuous corruption robustness (VCR) -- an extension of corruption robustness to allow assessing it over the wide and continuous range of changes that correspond to the human perceptive quality (i.e., from the original image to the full distortion of all perceived visual information), along with two novel human-aware metrics for NN evaluation. To compare VCR of NNs with human perception, we conducted extensive experiments on 14 commonly used image corruptions with 7,718 human participants and state-of-the-art robust NN models with different training objectives (e.g., standard, adversarial, corruption robustness), different architectures (e.g., convolution NNs, vision transformers), and different amounts of training data augmentation. Our study showed that: 1) assessing robustness against continuous corruption can reveal insufficient robustness undetected by existing benchmarks; as a result, 2) the gap between NN and human robustness is larger than previously known; and finally, 3) some image corruptions have a similar impact on human perception, offering opportunities for more cost-effective robustness assessments. Our validation set with 14 image corruptions, human robustness data, and the evaluation code is provided as a toolbox and a benchmark.

Assessing Visually-Continuous Corruption Robustness of Neural Networks Relative to Human Performance

TL;DR

and instantiates two properties, accuracy and prediction consistency, yielding

and

. The authors propose HMRI and MRSI to compare NN VCR against human performance and validate the approach with 14 corruptions, thousands of human participants, and diverse NN architectures, including vision transformers. Key findings show a larger robustness gap between NN and human perception than previously reported, particularly for blur, and demonstrate that data augmentation guided by VCR can reduce this gap. The work provides an open benchmark and toolbox to measure and improve NN robustness in a human-centric, continuous framework.

Abstract

Paper Structure (20 sections, 4 equations, 32 figures, 3 tables, 1 algorithm)

This paper contains 20 sections, 4 equations, 32 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Visually-Continuous Corruption Robustness (VCR)
Visually-Continuous Corruption Robustness (VCR) Definition
Testing VCR
Human-Aware Metrics for VCR
Experiments
Experiment 1: Testing Robustness against Visual Corruption
Experiment 2: VCR of DNNs Compared with Humans
Experiment 3: Training with Data Augmentation
Experiment 4: VCR for Visually Similar Corruption Functions
Conclusion
Implementation and Data
Overview of VCR-Bench
VCR and Its Estimation
...and 5 more sections

Figures (32)

Figure 1: Summary of VCR definitions with respect to accuracy and consistency.
Figure 2: Auxiliary VCR metrics to compute HMRI and MSRI.
Figure 3: Image corruption functions.
Figure 4: Histograms showing $\Delta_v$ distribution between ImageNet-C and our VCR test sets for Gaussian Blur.
Figure 5: Comparison between ImageNet-C and VCR with Gaussian Noise. Models discussed in the text are marked by a red triangle.
...and 27 more figures

Theorems & Definitions (3)

definition thmcounterdefinition: Human-Relative Model Robustness Index (HMRI)
definition thmcounterdefinition: Model Robustness Superiority Index (MRSI)
definition thmcounterdefinition

Assessing Visually-Continuous Corruption Robustness of Neural Networks Relative to Human Performance

TL;DR

Abstract

Assessing Visually-Continuous Corruption Robustness of Neural Networks Relative to Human Performance

Authors

TL;DR

Abstract

Table of Contents

Figures (32)

Theorems & Definitions (3)