Table of Contents
Fetching ...

Boosting Adversarial Attacks with Momentum

Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, Jianguo Li

TL;DR

The paper addresses the vulnerability of deep neural networks to adversarial examples, particularly under black-box conditions, by introducing momentum-based iterative attacks (MI-FGSM) that stabilize gradient updates to boost transferability. It couples MI-FGSM with ensemble-logits strategies to further improve black-box success and demonstrates the approach on ImageNet across multiple models, including adversarially trained ones, achieving strong results and competition wins. The authors also extend the framework to L2-norm bounds and targeted attacks, and provide comprehensive experiments and analyses of key hyperparameters, revealing practical weaknesses in defended models. Overall, the work provides a robust, transferable attack methodology and a benchmark for evaluating model robustness and defenses.

Abstract

Deep neural networks are vulnerable to adversarial examples, which poses security concerns on these algorithms due to the potentially severe consequences. Adversarial attacks serve as an important surrogate to evaluate the robustness of deep learning models before they are deployed. However, most of existing adversarial attacks can only fool a black-box model with a low success rate. To address this issue, we propose a broad class of momentum-based iterative algorithms to boost adversarial attacks. By integrating the momentum term into the iterative process for attacks, our methods can stabilize update directions and escape from poor local maxima during the iterations, resulting in more transferable adversarial examples. To further improve the success rates for black-box attacks, we apply momentum iterative algorithms to an ensemble of models, and show that the adversarially trained models with a strong defense ability are also vulnerable to our black-box attacks. We hope that the proposed methods will serve as a benchmark for evaluating the robustness of various deep models and defense methods. With this method, we won the first places in NIPS 2017 Non-targeted Adversarial Attack and Targeted Adversarial Attack competitions.

Boosting Adversarial Attacks with Momentum

TL;DR

The paper addresses the vulnerability of deep neural networks to adversarial examples, particularly under black-box conditions, by introducing momentum-based iterative attacks (MI-FGSM) that stabilize gradient updates to boost transferability. It couples MI-FGSM with ensemble-logits strategies to further improve black-box success and demonstrates the approach on ImageNet across multiple models, including adversarially trained ones, achieving strong results and competition wins. The authors also extend the framework to L2-norm bounds and targeted attacks, and provide comprehensive experiments and analyses of key hyperparameters, revealing practical weaknesses in defended models. Overall, the work provides a robust, transferable attack methodology and a benchmark for evaluating model robustness and defenses.

Abstract

Deep neural networks are vulnerable to adversarial examples, which poses security concerns on these algorithms due to the potentially severe consequences. Adversarial attacks serve as an important surrogate to evaluate the robustness of deep learning models before they are deployed. However, most of existing adversarial attacks can only fool a black-box model with a low success rate. To address this issue, we propose a broad class of momentum-based iterative algorithms to boost adversarial attacks. By integrating the momentum term into the iterative process for attacks, our methods can stabilize update directions and escape from poor local maxima during the iterations, resulting in more transferable adversarial examples. To further improve the success rates for black-box attacks, we apply momentum iterative algorithms to an ensemble of models, and show that the adversarially trained models with a strong defense ability are also vulnerable to our black-box attacks. We hope that the proposed methods will serve as a benchmark for evaluating the robustness of various deep models and defense methods. With this method, we won the first places in NIPS 2017 Non-targeted Adversarial Attack and Targeted Adversarial Attack competitions.

Paper Structure

This paper contains 27 sections, 12 equations, 5 figures, 7 tables, 2 algorithms.

Figures (5)

  • Figure 1: We show two adversarial examples generated by the proposed momentum iterative fast gradient sign method (MI-FGSM) for the Inception v3 Szegedy2015Rethinking model. Left column: the original images. Middle column: the adversarial noises by applying MI-FGSM for $10$ iterations. Right column: the generated adversarial images. We also show the predicted labels and probabilities of these images given by the Inception v3.
  • Figure 2: The success rates (%) of the adversarial examples generated for Inc-v3 against Inc-v3 (white-box), Inc-v4, IncRes-v2 and Res-152 (black-box), with $\mu$ ranging from $0.0$ to $2.0$.
  • Figure 3: The success rates (%) of the adversarial examples generated for Inc-v3 model against Inc-v3 (white-box), Inc-v4, IncRes-v2 and Res-152 (black-box). We compare the results of I-FGSM and MI-FGSM with different iterations. Please note that the curves of Inc-v3 vs. MI-FGSM and Inc-v3 vs. I-FGSM overlap together.
  • Figure 4: The cosine similarity of two successive perturbations in I-FGSM and MI-FGSM when attacking Inc-v3 model. The results are averaged over $1000$ images.
  • Figure 5: The success rates (%) of the adversarial examples generated for Inc-v3 against Inc-v3 (white-box) and Res-152 (black-box). We compare the results of FGSM, I-FGSM and MI-FGSM with different size of perturbation. The curves of Inc-v3 vs. MI-FGSM and Inc-v3 vs. I-FGSM overlap together.