Table of Contents
Fetching ...

Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks

Yonggan Fu, Qixuan Yu, Yang Zhang, Shang Wu, Xu Ouyang, David Cox, Yingyan Celine Lin

TL;DR

This work reveals Robust Scratch Tickets (RSTs): subnetworks with inborn robustness that exist inside randomly initialized networks without any training. By applying a sparse, learnable mask and solving a minimax objective, the authors identify RSTs that match or exceed the robust accuracy of adversarially trained models at similar sizes, across CIFAR-10/100 and ImageNet. They further show limited transferability between RSTs of different sparsities and propose Random RST Switch (R2S), a lightweight defense that randomly selects RSTs during inference to boost robustness across datasets and attacks. The findings suggest robustness can emerge from weight-location patterns in untrained networks, offering a new lens on robustness and a practical, parameter-efficient defense that complements the lottery ticket hypothesis. The work highlights potential for robustness without training and opens avenues for multi-task, sparse, adversarially robust designs.

Abstract

Deep Neural Networks (DNNs) are known to be vulnerable to adversarial attacks, i.e., an imperceptible perturbation to the input can mislead DNNs trained on clean images into making erroneous predictions. To tackle this, adversarial training is currently the most effective defense method, by augmenting the training set with adversarial samples generated on the fly. Interestingly, we discover for the first time that there exist subnetworks with inborn robustness, matching or surpassing the robust accuracy of the adversarially trained networks with comparable model sizes, within randomly initialized networks without any model training, indicating that adversarial training on model weights is not indispensable towards adversarial robustness. We name such subnetworks Robust Scratch Tickets (RSTs), which are also by nature efficient. Distinct from the popular lottery ticket hypothesis, neither the original dense networks nor the identified RSTs need to be trained. To validate and understand this fascinating finding, we further conduct extensive experiments to study the existence and properties of RSTs under different models, datasets, sparsity patterns, and attacks, drawing insights regarding the relationship between DNNs' robustness and their initialization/overparameterization. Furthermore, we identify the poor adversarial transferability between RSTs of different sparsity ratios drawn from the same randomly initialized dense network, and propose a Random RST Switch (R2S) technique, which randomly switches between different RSTs, as a novel defense method built on top of RSTs. We believe our findings about RSTs have opened up a new perspective to study model robustness and extend the lottery ticket hypothesis.

Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks

TL;DR

This work reveals Robust Scratch Tickets (RSTs): subnetworks with inborn robustness that exist inside randomly initialized networks without any training. By applying a sparse, learnable mask and solving a minimax objective, the authors identify RSTs that match or exceed the robust accuracy of adversarially trained models at similar sizes, across CIFAR-10/100 and ImageNet. They further show limited transferability between RSTs of different sparsities and propose Random RST Switch (R2S), a lightweight defense that randomly selects RSTs during inference to boost robustness across datasets and attacks. The findings suggest robustness can emerge from weight-location patterns in untrained networks, offering a new lens on robustness and a practical, parameter-efficient defense that complements the lottery ticket hypothesis. The work highlights potential for robustness without training and opens avenues for multi-task, sparse, adversarially robust designs.

Abstract

Deep Neural Networks (DNNs) are known to be vulnerable to adversarial attacks, i.e., an imperceptible perturbation to the input can mislead DNNs trained on clean images into making erroneous predictions. To tackle this, adversarial training is currently the most effective defense method, by augmenting the training set with adversarial samples generated on the fly. Interestingly, we discover for the first time that there exist subnetworks with inborn robustness, matching or surpassing the robust accuracy of the adversarially trained networks with comparable model sizes, within randomly initialized networks without any model training, indicating that adversarial training on model weights is not indispensable towards adversarial robustness. We name such subnetworks Robust Scratch Tickets (RSTs), which are also by nature efficient. Distinct from the popular lottery ticket hypothesis, neither the original dense networks nor the identified RSTs need to be trained. To validate and understand this fascinating finding, we further conduct extensive experiments to study the existence and properties of RSTs under different models, datasets, sparsity patterns, and attacks, drawing insights regarding the relationship between DNNs' robustness and their initialization/overparameterization. Furthermore, we identify the poor adversarial transferability between RSTs of different sparsity ratios drawn from the same randomly initialized dense network, and propose a Random RST Switch (R2S) technique, which randomly switches between different RSTs, as a novel defense method built on top of RSTs. We believe our findings about RSTs have opened up a new perspective to study model robustness and extend the lottery ticket hypothesis.

Paper Structure

This paper contains 34 sections, 2 equations, 14 figures, 9 tables.

Figures (14)

  • Figure 1: Illustrating RSTs' consistent existence, where (a)$\sim$(d): The robust and natural accuracy of RSTs with different remaining ratios in ResNet18 and WideRestNet32 on CIFAR-10/100, respectively; (e)$\sim$(f): The robust accuracy of RSTs with different remaining ratios identified in ResNet18 under different initialization methods on CIFAR-10/100, respectively; (g) The robust accuracy of RSTs with different sparsity patterns identified in ResNet18 on CIFAR-10; and (h) The robust accuracy of RSTs identified using different adversarial search methods in ResNet18 on CIFAR-10. The accuracies of the adversarially trained original dense networks are annotated using dashed lines.
  • Figure 2: Comparing the robust accuracy of RSTs, fine-tuned RSTs with inherited weights, and fine-tuned RSTs with reinitialization, with zoom-ins for the low remaining ratios (1%$\sim$20%).
  • Figure 3: The robust accuracy achieved by RSTs, natural RTTs, and adversarial RTTs drawn from ResNet18/WideResNet32 on CIFAR-10/100, with zoom-ins under low remaining ratios (1%$\sim$20%).
  • Figure 4: Normalized distances between the feature maps generated by clean and noisy images on ResNet18 / CIFAR-10.
  • Figure 5: Robust accuracy vs. remaining ratio for RSTs, fine-tuned RSTs with inherited weights, and adversarial RTTs.
  • ...and 9 more figures