Table of Contents
Fetching ...

Assessing Visually-Continuous Corruption Robustness of Neural Networks Relative to Human Performance

Huakun Shen, Boyue Caroline Hu, Krzysztof Czarnecki, Lina Marsso, Marsha Chechik

TL;DR

This work introduces visually-continuous corruption robustness (VCR) to evaluate neural networks against a continuous spectrum of perceptual image degradations, aligning robustness assessment with human vision. It defines a perceptual visual-change metric via $\Delta_v$ and instantiates two properties, accuracy and prediction consistency, yielding $\mathcal{R}_a$ and $\mathcal{R}_p$. The authors propose HMRI and MRSI to compare NN VCR against human performance and validate the approach with 14 corruptions, thousands of human participants, and diverse NN architectures, including vision transformers. Key findings show a larger robustness gap between NN and human perception than previously reported, particularly for blur, and demonstrate that data augmentation guided by VCR can reduce this gap. The work provides an open benchmark and toolbox to measure and improve NN robustness in a human-centric, continuous framework.

Abstract

While Neural Networks (NNs) have surpassed human accuracy in image classification on ImageNet, they often lack robustness against image corruption, i.e., corruption robustness. Yet such robustness is seemingly effortless for human perception. In this paper, we propose visually-continuous corruption robustness (VCR) -- an extension of corruption robustness to allow assessing it over the wide and continuous range of changes that correspond to the human perceptive quality (i.e., from the original image to the full distortion of all perceived visual information), along with two novel human-aware metrics for NN evaluation. To compare VCR of NNs with human perception, we conducted extensive experiments on 14 commonly used image corruptions with 7,718 human participants and state-of-the-art robust NN models with different training objectives (e.g., standard, adversarial, corruption robustness), different architectures (e.g., convolution NNs, vision transformers), and different amounts of training data augmentation. Our study showed that: 1) assessing robustness against continuous corruption can reveal insufficient robustness undetected by existing benchmarks; as a result, 2) the gap between NN and human robustness is larger than previously known; and finally, 3) some image corruptions have a similar impact on human perception, offering opportunities for more cost-effective robustness assessments. Our validation set with 14 image corruptions, human robustness data, and the evaluation code is provided as a toolbox and a benchmark.

Assessing Visually-Continuous Corruption Robustness of Neural Networks Relative to Human Performance

TL;DR

This work introduces visually-continuous corruption robustness (VCR) to evaluate neural networks against a continuous spectrum of perceptual image degradations, aligning robustness assessment with human vision. It defines a perceptual visual-change metric via and instantiates two properties, accuracy and prediction consistency, yielding and . The authors propose HMRI and MRSI to compare NN VCR against human performance and validate the approach with 14 corruptions, thousands of human participants, and diverse NN architectures, including vision transformers. Key findings show a larger robustness gap between NN and human perception than previously reported, particularly for blur, and demonstrate that data augmentation guided by VCR can reduce this gap. The work provides an open benchmark and toolbox to measure and improve NN robustness in a human-centric, continuous framework.

Abstract

While Neural Networks (NNs) have surpassed human accuracy in image classification on ImageNet, they often lack robustness against image corruption, i.e., corruption robustness. Yet such robustness is seemingly effortless for human perception. In this paper, we propose visually-continuous corruption robustness (VCR) -- an extension of corruption robustness to allow assessing it over the wide and continuous range of changes that correspond to the human perceptive quality (i.e., from the original image to the full distortion of all perceived visual information), along with two novel human-aware metrics for NN evaluation. To compare VCR of NNs with human perception, we conducted extensive experiments on 14 commonly used image corruptions with 7,718 human participants and state-of-the-art robust NN models with different training objectives (e.g., standard, adversarial, corruption robustness), different architectures (e.g., convolution NNs, vision transformers), and different amounts of training data augmentation. Our study showed that: 1) assessing robustness against continuous corruption can reveal insufficient robustness undetected by existing benchmarks; as a result, 2) the gap between NN and human robustness is larger than previously known; and finally, 3) some image corruptions have a similar impact on human perception, offering opportunities for more cost-effective robustness assessments. Our validation set with 14 image corruptions, human robustness data, and the evaluation code is provided as a toolbox and a benchmark.
Paper Structure (20 sections, 4 equations, 32 figures, 3 tables, 1 algorithm)

This paper contains 20 sections, 4 equations, 32 figures, 3 tables, 1 algorithm.

Figures (32)

  • Figure 1: Summary of VCR definitions with respect to accuracy and consistency.
  • Figure 2: Auxiliary VCR metrics to compute HMRI and MSRI.
  • Figure 3: Image corruption functions.
  • Figure 4: Histograms showing $\Delta_v$ distribution between ImageNet-C and our VCR test sets for Gaussian Blur.
  • Figure 5: Comparison between ImageNet-C and VCR with Gaussian Noise. Models discussed in the text are marked by a red triangle.
  • ...and 27 more figures

Theorems & Definitions (3)

  • definition thmcounterdefinition: Human-Relative Model Robustness Index (HMRI)
  • definition thmcounterdefinition: Model Robustness Superiority Index (MRSI)
  • definition thmcounterdefinition