Table of Contents
Fetching ...

RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

Shiyu Tang, Ruihao Gong, Yan Wang, Aishan Liu, Jiakai Wang, Xinyun Chen, Fengwei Yu, Xianglong Liu, Dawn Song, Alan Yuille, Philip H. S. Torr, Dacheng Tao

TL;DR

RobustART tackles the problem of understanding how architecture design and training techniques influence deep neural network robustness on ImageNet across adversarial, natural, and system noises. It introduces an open-source benchmark and model zoo that evaluate over 1,000 architectures (49 handcrafted plus 1,200 NAS-derived) and 10+ training techniques under diverse noise sources. The study finds that adversarial training universally strengthens robustness for Transformers and MLP-Mixers, while CNNs tend to be more robust to natural and system noises; architecture type and capacity strongly shape robustness, though there are notable exceptions. By highlighting the importance of noise diversity and providing a scalable, community-oriented platform, RobustART offers practical guidance for designing robust DNNs and fostering broader robustness research.

Abstract

Deep neural networks (DNNs) are vulnerable to adversarial noises, which motivates the benchmark of model robustness. Existing benchmarks mainly focus on evaluating defenses, but there are no comprehensive studies of how architecture design and training techniques affect robustness. Comprehensively benchmarking their relationships is beneficial for better understanding and developing robust DNNs. Thus, we propose RobustART, the first comprehensive Robustness investigation benchmark on ImageNet regarding ARchitecture design (49 human-designed off-the-shelf architectures and 1200+ networks from neural architecture search) and Training techniques (10+ techniques, e.g., data augmentation) towards diverse noises (adversarial, natural, and system noises). Extensive experiments substantiated several insights for the first time, e.g., (1) adversarial training is effective for the robustness against all noises types for Transformers and MLP-Mixers; (2) given comparable model sizes and aligned training settings, CNNs > Transformers > MLP-Mixers on robustness against natural and system noises; Transformers > MLP-Mixers > CNNs on adversarial robustness; (3) for some light-weight architectures, increasing model sizes or using extra data cannot improve robustness. Our benchmark presents: (1) an open-source platform for comprehensive robustness evaluation; (2) a variety of pre-trained models to facilitate robustness evaluation; and (3) a new view to better understand the mechanism towards designing robust DNNs. We will continuously develop to this ecosystem for the community.

RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

TL;DR

RobustART tackles the problem of understanding how architecture design and training techniques influence deep neural network robustness on ImageNet across adversarial, natural, and system noises. It introduces an open-source benchmark and model zoo that evaluate over 1,000 architectures (49 handcrafted plus 1,200 NAS-derived) and 10+ training techniques under diverse noise sources. The study finds that adversarial training universally strengthens robustness for Transformers and MLP-Mixers, while CNNs tend to be more robust to natural and system noises; architecture type and capacity strongly shape robustness, though there are notable exceptions. By highlighting the importance of noise diversity and providing a scalable, community-oriented platform, RobustART offers practical guidance for designing robust DNNs and fostering broader robustness research.

Abstract

Deep neural networks (DNNs) are vulnerable to adversarial noises, which motivates the benchmark of model robustness. Existing benchmarks mainly focus on evaluating defenses, but there are no comprehensive studies of how architecture design and training techniques affect robustness. Comprehensively benchmarking their relationships is beneficial for better understanding and developing robust DNNs. Thus, we propose RobustART, the first comprehensive Robustness investigation benchmark on ImageNet regarding ARchitecture design (49 human-designed off-the-shelf architectures and 1200+ networks from neural architecture search) and Training techniques (10+ techniques, e.g., data augmentation) towards diverse noises (adversarial, natural, and system noises). Extensive experiments substantiated several insights for the first time, e.g., (1) adversarial training is effective for the robustness against all noises types for Transformers and MLP-Mixers; (2) given comparable model sizes and aligned training settings, CNNs > Transformers > MLP-Mixers on robustness against natural and system noises; Transformers > MLP-Mixers > CNNs on adversarial robustness; (3) for some light-weight architectures, increasing model sizes or using extra data cannot improve robustness. Our benchmark presents: (1) an open-source platform for comprehensive robustness evaluation; (2) a variety of pre-trained models to facilitate robustness evaluation; and (3) a new view to better understand the mechanism towards designing robust DNNs. We will continuously develop to this ecosystem for the community.

Paper Structure

This paper contains 52 sections, 4 equations, 29 figures, 5 tables.

Figures (29)

  • Figure 1: Our benchmark is built as a modular framework, in which four core modules are provided for users in an easy-to-use way.
  • Figure 2: Robustness on human-designed off-the-shelf architectures. (first line) from left to right: clean accuracy on standard ImageNet, WCAR (small magnitude) under all adversarial attacks, AR under PGD-$\ell_{\infty}$ attack with $\epsilon$=0.5/255, 1-mCE on ImageNet-C; (second line) from left to right: NmFP on ImageNet-P, accuracy on ImageNet-A, AUPR on ImageNet-O, and NSD on ImageNet-S. Results of different FLOPS are in Section \ref{['sec:supp_arch-moreres']} of the supplementary materials. Results for different adversarial attacks with different magnitudes are shown in Figure \ref{['fig:supp-ddn']} to \ref{['fig:supp-cw']} in the supplementary material.
  • Figure 3: Transferability heatmap of human-designed off-the-shelf architectures under FGSM attack, $\epsilon$=8/255. Values mean attack success rates (ASR) from a source model to a target model.
  • Figure 4: Robustness studies on NAS-sampled model architectures using supernets including MobileNetV3 and ResNet families. From left to right, we report the effects of model size, input image size, the depth of subnets' last stage, and the sum of convolution kernel sizes on adversarial robustness.
  • Figure 5: Robustness of different model architectures with or without the selected training techniques on adversarial and natural noises. More results about other training techniques are shown in Figure \ref{['fig:supp-21kpretrain']} to \ref{['fig:supp-weightrepara']} in Section \ref{['sec:supp_technique-moreres']} of the supplementary materials.
  • ...and 24 more figures