Table of Contents
Fetching ...

Revisiting Min-Max Optimization Problem in Adversarial Training

Sina Hajer Ahmadi, Hassan Bahrami

TL;DR

The paper targets adversarial vulnerability in CNNs by reformulating the classic adversarial training min-max problem into a probabilistic saddle-point framework. It replaces the inner maximization with a perturbation integral weighted by an exponential loss, leveraging sampling from priors and large $\lambda$ to approximate the worst-case perturbation. The authors explore multiple perturbation sampling strategies, including spatial and DCT-domain methods with uniform, PGD/CW-inspired, Laplacian, and empirical-PDF perturbations, and report that empirical-pdf sampling in both spatial and DCT domains yields the strongest robustness on MNIST. Results show improved resistance to PGD attacks over a range of $\epsilon$ values and a more graceful degradation under stronger adversaries, suggesting a viable path toward more robust deep learning models; however, the approach shows limited defense against CW attacks and is slated for further validation on CIFAR-10 and additional attack types. Overall, the work offers a theoretically motivated, empirically validated step toward more secure and robust adversarial training through probabilistic perturbation sampling and domain-aware perturbation modeling.

Abstract

The rise of computer vision applications in the real world puts the security of the deep neural networks at risk. Recent works demonstrate that convolutional neural networks are susceptible to adversarial examples - where the input images look similar to the natural images but are classified incorrectly by the model. To provide a rebuttal to this problem, we propose a new method to build robust deep neural networks against adversarial attacks by reformulating the saddle point optimization problem in \cite{madry2017towards}. Our proposed method offers significant resistance and a concrete security guarantee against multiple adversaries. The goal of this paper is to act as a stepping stone for a new variation of deep learning models which would lead towards fully robust deep learning models.

Revisiting Min-Max Optimization Problem in Adversarial Training

TL;DR

The paper targets adversarial vulnerability in CNNs by reformulating the classic adversarial training min-max problem into a probabilistic saddle-point framework. It replaces the inner maximization with a perturbation integral weighted by an exponential loss, leveraging sampling from priors and large to approximate the worst-case perturbation. The authors explore multiple perturbation sampling strategies, including spatial and DCT-domain methods with uniform, PGD/CW-inspired, Laplacian, and empirical-PDF perturbations, and report that empirical-pdf sampling in both spatial and DCT domains yields the strongest robustness on MNIST. Results show improved resistance to PGD attacks over a range of values and a more graceful degradation under stronger adversaries, suggesting a viable path toward more robust deep learning models; however, the approach shows limited defense against CW attacks and is slated for further validation on CIFAR-10 and additional attack types. Overall, the work offers a theoretically motivated, empirically validated step toward more secure and robust adversarial training through probabilistic perturbation sampling and domain-aware perturbation modeling.

Abstract

The rise of computer vision applications in the real world puts the security of the deep neural networks at risk. Recent works demonstrate that convolutional neural networks are susceptible to adversarial examples - where the input images look similar to the natural images but are classified incorrectly by the model. To provide a rebuttal to this problem, we propose a new method to build robust deep neural networks against adversarial attacks by reformulating the saddle point optimization problem in \cite{madry2017towards}. Our proposed method offers significant resistance and a concrete security guarantee against multiple adversaries. The goal of this paper is to act as a stepping stone for a new variation of deep learning models which would lead towards fully robust deep learning models.
Paper Structure (16 sections, 1 theorem, 5 equations, 5 figures)

This paper contains 16 sections, 1 theorem, 5 equations, 5 figures.

Key Result

Theorem 1

For a large value of $\lambda$, the two following saddle point optimization problems are equivalent:

Figures (5)

  • Figure 1: Uniform Distribution Sampling
  • Figure 2: Sampling based on PGD Perturbations
  • Figure 3: Sampling based on CW Perturbations
  • Figure 4: The diagram for simulating the perturbation in the DCT domain.
  • Figure 9: A: Performance of Madrylab network against PGD adversaries of different strength (blue line). B: Performance of our adversarial trained network against PGD adversaries of different strength. The model was trained against $\epsilon$ = 0.3.

Theorems & Definitions (2)

  • Theorem 1
  • proof