Table of Contents
Fetching ...

MixTrain: Scalable Training of Verifiably Robust Neural Networks

Shiqi Wang, Yizheng Chen, Ahmed Abdou, Suman Jana

TL;DR

MixTrain tackles the scalability gap in verifiably robust training by introducing stochastic robust approximation and dynamic mixed training, achieving high verified robustness with substantially reduced training time and memory. It outperforms state-of-the-art verifiable and adversarial training baselines across MNIST, CIFAR, and ImageNet-200, and scales to large architectures. The work also presents a first-order interval-gradient attack that reveals weaknesses in adversarially robust training, underscoring the practical value of verifiable guarantees. Overall, MixTrain provides a practical pathway to deploy verifiably robust neural networks in real-world, large-scale settings.

Abstract

Making neural networks robust against adversarial inputs has resulted in an arms race between new defenses and attacks. The most promising defenses, adversarially robust training and verifiably robust training, have limitations that restrict their practical applications. The adversarially robust training only makes the networks robust against a subclass of attackers and we reveal such weaknesses by developing a new attack based on interval gradients. By contrast, verifiably robust training provides protection against any L-p norm-bounded attacker but incurs orders of magnitude more computational and memory overhead than adversarially robust training. We propose two novel techniques, stochastic robust approximation and dynamic mixed training, to drastically improve the efficiency of verifiably robust training without sacrificing verified robustness. We leverage two critical insights: (1) instead of over the entire training set, sound over-approximations over randomly subsampled training data points are sufficient for efficiently guiding the robust training process; and (2) We observe that the test accuracy and verifiable robustness often conflict after certain training epochs. Therefore, we use a dynamic loss function to adaptively balance them for each epoch. We designed and implemented our techniques as part of MixTrain and evaluated it on six networks trained on three popular datasets including MNIST, CIFAR, and ImageNet-200. Our evaluations show that MixTrain can achieve up to $95.2\%$ verified robust accuracy against $L_\infty$ norm-bounded attackers while taking $15$ and $3$ times less training time than state-of-the-art verifiably robust training and adversarially robust training schemes, respectively. Furthermore, MixTrain easily scales to larger networks like the one trained on ImageNet-200, significantly outperforming the existing verifiably robust training methods.

MixTrain: Scalable Training of Verifiably Robust Neural Networks

TL;DR

MixTrain tackles the scalability gap in verifiably robust training by introducing stochastic robust approximation and dynamic mixed training, achieving high verified robustness with substantially reduced training time and memory. It outperforms state-of-the-art verifiable and adversarial training baselines across MNIST, CIFAR, and ImageNet-200, and scales to large architectures. The work also presents a first-order interval-gradient attack that reveals weaknesses in adversarially robust training, underscoring the practical value of verifiable guarantees. Overall, MixTrain provides a practical pathway to deploy verifiably robust neural networks in real-world, large-scale settings.

Abstract

Making neural networks robust against adversarial inputs has resulted in an arms race between new defenses and attacks. The most promising defenses, adversarially robust training and verifiably robust training, have limitations that restrict their practical applications. The adversarially robust training only makes the networks robust against a subclass of attackers and we reveal such weaknesses by developing a new attack based on interval gradients. By contrast, verifiably robust training provides protection against any L-p norm-bounded attacker but incurs orders of magnitude more computational and memory overhead than adversarially robust training. We propose two novel techniques, stochastic robust approximation and dynamic mixed training, to drastically improve the efficiency of verifiably robust training without sacrificing verified robustness. We leverage two critical insights: (1) instead of over the entire training set, sound over-approximations over randomly subsampled training data points are sufficient for efficiently guiding the robust training process; and (2) We observe that the test accuracy and verifiable robustness often conflict after certain training epochs. Therefore, we use a dynamic loss function to adaptively balance them for each epoch. We designed and implemented our techniques as part of MixTrain and evaluated it on six networks trained on three popular datasets including MNIST, CIFAR, and ImageNet-200. Our evaluations show that MixTrain can achieve up to verified robust accuracy against norm-bounded attackers while taking and times less training time than state-of-the-art verifiably robust training and adversarially robust training schemes, respectively. Furthermore, MixTrain easily scales to larger networks like the one trained on ImageNet-200, significantly outperforming the existing verifiably robust training methods.

Paper Structure

This paper contains 24 sections, 16 equations, 8 figures, 10 tables, 1 algorithm.

Figures (8)

  • Figure 1: The difference between regular gradient and interval gradient. Figure (a) illustrates that PGD attacks using regular gradients might get stuck at $l_1$. Figure (b) shows that interval gradient $g_I$ over an input region $B_{\epsilon_0}$ allows us to avoid such problem and successfully locate the violation $l_2$. Here $Eq_{up}$ and $Eq_{low}$ are the symbolic upper and lower bounds of the output as found by symbolic linear relaxation.
  • Figure 2: Adversarially robust training using the interval attack does not converge well compared to using PGD attacks, given 12 hours of training time on the MNIST_Small network with $L_\infty\leq 0.3$. The estimated robust accuracy of training using PGD attacks madry2017towards quickly converges to 89.3% ERA, while training using the interval attack struggles to converge.
  • Figure 3: Distributions of the loss values from robustness violations found by PGD attacks (blue), CW attacks (green), and interval attacks (red) with 100,000 random starts within the allowable input range $B_\epsilon(x)$. The loss values found by CW and PGD attacks are very small and concentrated. However, interval attacks show there are still many distinct violations with much larger loss values.
  • Figure 4: The conflicting changes in regular loss and verifiable robust loss while training two CIFAR_Small networks. The left one is regular training after x epochs of verifiably robust training (Wong et al.'s method wong2018scaling) and the right one is verifiably robust training after x epochs of regular training.
  • Figure 5: The distributions of verifiable robust loss from the entire training set $\mathcal{D}_0$ and from the sampled training set $\mathcal{D}_k$ (k=1000) are very similar.
  • ...and 3 more figures