Robust NAS under adversarial training: benchmark, theory, and beyond
Yongtao Wu, Fanghui Liu, Carl-Johann Simon-Gabriel, Grigorios G Chrysos, Volkan Cevher
TL;DR
This work tackles the need for robust neural architecture search by (i) releasing NAS-RobBench-201, a benchmark that evaluates 6466 NAS-Bench-201 architectures under adversarial training across CIFAR-10/100 and ImageNet-16-120, with robust and clean accuracies, and (ii) developing an NTK-based generalization theory for multi-objective NAS that jointly accounts for standard and adversarial objectives. The theory shows that clean accuracy is governed by a combination of the clean NTK and a robust NTK, while robust accuracy depends on a robust NTK and its twice-perturbed variant, and provides a lower bound on the minimum eigenvalue to guarantee generalization. Empirically, the work demonstrates that robust NTK metrics correlate more strongly with robustness than traditional NTK, and that robust benchmarks yield different architecture rankings than standard benchmarks, underscoring the need for adversarially trained NAS evaluations. The dataset and theory together offer reproducible benchmarks and a theoretical foundation to guide the design of robust NAS algorithms, with practical impact on how researchers assess and pursue robustness in architecture search.
Abstract
Recent developments in neural architecture search (NAS) emphasize the significance of considering robust architectures against malicious data. However, there is a notable absence of benchmark evaluations and theoretical guarantees for searching these robust architectures, especially when adversarial training is considered. In this work, we aim to address these two challenges, making twofold contributions. First, we release a comprehensive data set that encompasses both clean accuracy and robust accuracy for a vast array of adversarially trained networks from the NAS-Bench-201 search space on image datasets. Then, leveraging the neural tangent kernel (NTK) tool from deep learning theory, we establish a generalization theory for searching architecture in terms of clean accuracy and robust accuracy under multi-objective adversarial training. We firmly believe that our benchmark and theoretical insights will significantly benefit the NAS community through reliable reproducibility, efficient assessment, and theoretical foundation, particularly in the pursuit of robust architectures.
