Table of Contents
Fetching ...

Enhancing Adversarial Training via Reweighting Optimization Trajectory

Tianjin Huang, Shiwei Liu, Tianlong Chen, Meng Fang, Li Shen, Vlaod Menkovski, Lu Yin, Yulong Pei, Mykola Pechenizkiy

TL;DR

This paper tackles robust overfitting in adversarial training by introducing Weighted Optimization Trajectories (WOT), a trajectory-based refinement that reweights historical optimization steps to guide training toward flatter minima. WOT constructs refined weight updates as $\tilde{\Delta w}=\sum_i \alpha^i \Delta w^i$ with $0\leq \alpha^i\leq 1$, and optimizes the coefficients $\alpha$ against the adversarial loss on an unseen hold-out set under perturbations $\|\Delta x_{uns}\|\leq \epsilon$, i.e., $\min_{0\leq \alpha^i\leq 1}\max_{\|\Delta x_{uns}\|\leq \epsilon} L(f_{w+\tilde{\Delta w}}(x_{uns}+\Delta x_{uns}), y_{uns})$. The approach features in-time refinement and a blockwise variant (WOT-B) to enlarge the refinement space, and it demonstrates consistent robustness gains across AT variants, datasets, and architectures, along with reduced sensitivity to overfitting, as evidenced by flatter loss landscapes and stable performance across epochs. The results indicate practical, architecture- and dataset-agnostic improvements in adversarial robustness with minimal training disruption, supported by thorough ablations and visualizations. Overall, WOT offers a principled trajectory-level strategy to enhance robust generalization in adversarial training.

Abstract

Despite the fact that adversarial training has become the de facto method for improving the robustness of deep neural networks, it is well-known that vanilla adversarial training suffers from daunting robust overfitting, resulting in unsatisfactory robust generalization. A number of approaches have been proposed to address these drawbacks such as extra regularization, adversarial weights perturbation, and training with more data over the last few years. However, the robust generalization improvement is yet far from satisfactory. In this paper, we approach this challenge with a brand new perspective -- refining historical optimization trajectories. We propose a new method named \textbf{Weighted Optimization Trajectories (WOT)} that leverages the optimization trajectories of adversarial training in time. We have conducted extensive experiments to demonstrate the effectiveness of WOT under various state-of-the-art adversarial attacks. Our results show that WOT integrates seamlessly with the existing adversarial training methods and consistently overcomes the robust overfitting issue, resulting in better adversarial robustness. For example, WOT boosts the robust accuracy of AT-PGD under AA-$L_{\infty}$ attack by 1.53\% $\sim$ 6.11\% and meanwhile increases the clean accuracy by 0.55\%$\sim$5.47\% across SVHN, CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets.

Enhancing Adversarial Training via Reweighting Optimization Trajectory

TL;DR

This paper tackles robust overfitting in adversarial training by introducing Weighted Optimization Trajectories (WOT), a trajectory-based refinement that reweights historical optimization steps to guide training toward flatter minima. WOT constructs refined weight updates as with , and optimizes the coefficients against the adversarial loss on an unseen hold-out set under perturbations , i.e., . The approach features in-time refinement and a blockwise variant (WOT-B) to enlarge the refinement space, and it demonstrates consistent robustness gains across AT variants, datasets, and architectures, along with reduced sensitivity to overfitting, as evidenced by flatter loss landscapes and stable performance across epochs. The results indicate practical, architecture- and dataset-agnostic improvements in adversarial robustness with minimal training disruption, supported by thorough ablations and visualizations. Overall, WOT offers a principled trajectory-level strategy to enhance robust generalization in adversarial training.

Abstract

Despite the fact that adversarial training has become the de facto method for improving the robustness of deep neural networks, it is well-known that vanilla adversarial training suffers from daunting robust overfitting, resulting in unsatisfactory robust generalization. A number of approaches have been proposed to address these drawbacks such as extra regularization, adversarial weights perturbation, and training with more data over the last few years. However, the robust generalization improvement is yet far from satisfactory. In this paper, we approach this challenge with a brand new perspective -- refining historical optimization trajectories. We propose a new method named \textbf{Weighted Optimization Trajectories (WOT)} that leverages the optimization trajectories of adversarial training in time. We have conducted extensive experiments to demonstrate the effectiveness of WOT under various state-of-the-art adversarial attacks. Our results show that WOT integrates seamlessly with the existing adversarial training methods and consistently overcomes the robust overfitting issue, resulting in better adversarial robustness. For example, WOT boosts the robust accuracy of AT-PGD under AA- attack by 1.53\% 6.11\% and meanwhile increases the clean accuracy by 0.55\%5.47\% across SVHN, CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets.
Paper Structure (22 sections, 13 equations, 7 figures, 13 tables, 1 algorithm)

This paper contains 22 sections, 13 equations, 7 figures, 13 tables, 1 algorithm.

Figures (7)

  • Figure 1: Visualization of loss contours and optimization trajectories for AT-PGD, AT-PGD+WOT-W, and AT-PGD+WOT-B, respectively. The experiments are conducted on CIFAR-10 with PreRN-18.
  • Figure 2: Sketch map of WOT.
  • Figure 3: Robust accuracy under black-box attack over epochs. (Left) Robust accuracy on the unseen robust model transfer attacked from checkpoints of AT, AT+WOT-W/B. (Middle) Robust accuracy on checkpoints of AT, AT+WOT-W/B transfer attacked from the unseen model. (Right) Robust accuracy on checkpoints of AT, AT+WOT-W/B under SPSA black-box attack. The experiments are conducted on PreRN-18 and CIFAR-10. The unseen robust model is WRN-34-10 trained by AT.
  • Figure 4: Mean value of $\alpha$ and results of test robust/clean accuracy over epochs. The experiments are conducted on CIFAR-10 with PreRN-18 based on AT.
  • Figure 5: The impact of gaps $m$ and the number of gaps $k$ on robust accuracy under AA-$L_{\infty}$ attack. The experiments are conducted on CIFAR-10 with PreRN-18 based on AT. $k$ is fixed to 4 for the left figure and $m$ is fixed to 400 for the right figure.
  • ...and 2 more figures