RNAS-CL: Robust Neural Architecture Search by Cross-Layer Knowledge Distillation

Utkarsh Nath; Yancheng Wang; Yingzhen Yang

RNAS-CL: Robust Neural Architecture Search by Cross-Layer Knowledge Distillation

Utkarsh Nath, Yancheng Wang, Yingzhen Yang

TL;DR

RNAS-CL tackles adversarial robustness in neural architecture search by introducing cross-layer knowledge distillation from robust teachers. It jointly searches the student architecture and per-layer tutor mappings using a differentiable framework based on attention-map alignment and Gumbel-Softmax tutor selection, yielding compact architectures with improved robustness without adversarial training. The approach achieves competitive or superior robustness and clean accuracy on CIFAR-10 and ImageNet-100 across multiple teacher models, and ablations confirm the value of intermediate-layer supervision. The work highlights the practical potential of leveraging robust, cross-layer guidance to obtain efficient, resilient neural networks without the cost of robust training, while also suggesting directions to further amplify robustness through training-time enhancements like TRADES.

Abstract

Deep Neural Networks are vulnerable to adversarial attacks. Neural Architecture Search (NAS), one of the driving tools of deep neural networks, demonstrates superior performance in prediction accuracy in various machine learning applications. However, it is unclear how it performs against adversarial attacks. Given the presence of a robust teacher, it would be interesting to investigate if NAS would produce robust neural architecture by inheriting robustness from the teacher. In this paper, we propose Robust Neural Architecture Search by Cross-Layer Knowledge Distillation (RNAS-CL), a novel NAS algorithm that improves the robustness of NAS by learning from a robust teacher through cross-layer knowledge distillation. Unlike previous knowledge distillation methods that encourage close student/teacher output only in the last layer, RNAS-CL automatically searches for the best teacher layer to supervise each student layer. Experimental result evidences the effectiveness of RNAS-CL and shows that RNAS-CL produces small and robust neural architecture.

RNAS-CL: Robust Neural Architecture Search by Cross-Layer Knowledge Distillation

TL;DR

Abstract

Paper Structure (27 sections, 3 equations, 12 figures, 8 tables)

This paper contains 27 sections, 3 equations, 12 figures, 8 tables.

Introduction
Contributions
Related Work
Knowledge Distillation
Neural Architecture Search
Efficient and Robust models
Robust Knowledge Distillation for Neural Architecture Search
Attention Map
Tutor Search
Architecture Search
RNAS-CL Loss
Experiments
Implementation Details
Compare Efficient and Robust CIFAR-10 models
Comparison against KD Variants
...and 12 more sections

Figures (12)

Figure 1: The figure compares various SOTA efficient and robust methods on CIFAR-10. Clean Accuracy represents top-1 accuracy on clean images. Adversarial Accuracy represents top-1 accuracy on images perturbed by PGD attack. A larger marker size indicates larger architecture. The numbers in brackets represent the number of parameters and MACs, respectively.
Figure 2: (a) Training paradigm based on RNAS-CL. We connect attention maps from each student layer to each robust teacher layer. For each student layer, we search for the optimum teacher layer. $g_{ij}$ represents gumbel weights associated between $i^{th}$ student layer and $j^{th}$ teacher layer. RNAS-CL induces robustness to the student model by searching for the optimum teacher layer. We also search for the number of filters in each layer to build an efficient model inspired by FBNetV2 wan2020fbnetv2. (b) Sample attention maps corresponding to input Image (i) from low-level (ii), mid-level (iii), and high-level (iv) convolution layers.
Figure 3: Robustness evaluation under different perturbation sizes for PGD and FGSM attacks.
Figure 4: The figure compares various knowledge distillation variants (Similarity tung2019similarity, VID ahn2019variational, RKD park2019relational, CRD tian2019crd, PKD passalis2018learning) against RNAS-CL on the CIFAR-10 dataset. Adversarial Accuracy represents top-1 Accuracy on images perturbed by 20 step PGD attack. Clean Accuracy represents top-1 Accuracy on clean images. Larger marker size indicates larger architecture. For each method, RNAS-CL-S3, RNAS-CL-S5, and RNAS-CL-S7 are represented by increasing marker size.
Figure 5: Adversarial accuracy of various models at various perturbation budgets on the ImageNet-100 dataset.
...and 7 more figures

RNAS-CL: Robust Neural Architecture Search by Cross-Layer Knowledge Distillation

TL;DR

Abstract

RNAS-CL: Robust Neural Architecture Search by Cross-Layer Knowledge Distillation

Authors

TL;DR

Abstract

Table of Contents

Figures (12)