RobustART: Benchmarking Robustness on Architecture Design and Training Techniques
Shiyu Tang, Ruihao Gong, Yan Wang, Aishan Liu, Jiakai Wang, Xinyun Chen, Fengwei Yu, Xianglong Liu, Dawn Song, Alan Yuille, Philip H. S. Torr, Dacheng Tao
TL;DR
RobustART tackles the problem of understanding how architecture design and training techniques influence deep neural network robustness on ImageNet across adversarial, natural, and system noises. It introduces an open-source benchmark and model zoo that evaluate over 1,000 architectures (49 handcrafted plus 1,200 NAS-derived) and 10+ training techniques under diverse noise sources. The study finds that adversarial training universally strengthens robustness for Transformers and MLP-Mixers, while CNNs tend to be more robust to natural and system noises; architecture type and capacity strongly shape robustness, though there are notable exceptions. By highlighting the importance of noise diversity and providing a scalable, community-oriented platform, RobustART offers practical guidance for designing robust DNNs and fostering broader robustness research.
Abstract
Deep neural networks (DNNs) are vulnerable to adversarial noises, which motivates the benchmark of model robustness. Existing benchmarks mainly focus on evaluating defenses, but there are no comprehensive studies of how architecture design and training techniques affect robustness. Comprehensively benchmarking their relationships is beneficial for better understanding and developing robust DNNs. Thus, we propose RobustART, the first comprehensive Robustness investigation benchmark on ImageNet regarding ARchitecture design (49 human-designed off-the-shelf architectures and 1200+ networks from neural architecture search) and Training techniques (10+ techniques, e.g., data augmentation) towards diverse noises (adversarial, natural, and system noises). Extensive experiments substantiated several insights for the first time, e.g., (1) adversarial training is effective for the robustness against all noises types for Transformers and MLP-Mixers; (2) given comparable model sizes and aligned training settings, CNNs > Transformers > MLP-Mixers on robustness against natural and system noises; Transformers > MLP-Mixers > CNNs on adversarial robustness; (3) for some light-weight architectures, increasing model sizes or using extra data cannot improve robustness. Our benchmark presents: (1) an open-source platform for comprehensive robustness evaluation; (2) a variety of pre-trained models to facilitate robustness evaluation; and (3) a new view to better understand the mechanism towards designing robust DNNs. We will continuously develop to this ecosystem for the community.
