Revisiting Min-Max Optimization Problem in Adversarial Training

Sina Hajer Ahmadi; Hassan Bahrami

Revisiting Min-Max Optimization Problem in Adversarial Training

Sina Hajer Ahmadi, Hassan Bahrami

TL;DR

The paper targets adversarial vulnerability in CNNs by reformulating the classic adversarial training min-max problem into a probabilistic saddle-point framework. It replaces the inner maximization with a perturbation integral weighted by an exponential loss, leveraging sampling from priors and large $\lambda$ to approximate the worst-case perturbation. The authors explore multiple perturbation sampling strategies, including spatial and DCT-domain methods with uniform, PGD/CW-inspired, Laplacian, and empirical-PDF perturbations, and report that empirical-pdf sampling in both spatial and DCT domains yields the strongest robustness on MNIST. Results show improved resistance to PGD attacks over a range of $\epsilon$ values and a more graceful degradation under stronger adversaries, suggesting a viable path toward more robust deep learning models; however, the approach shows limited defense against CW attacks and is slated for further validation on CIFAR-10 and additional attack types. Overall, the work offers a theoretically motivated, empirically validated step toward more secure and robust adversarial training through probabilistic perturbation sampling and domain-aware perturbation modeling.

Abstract

The rise of computer vision applications in the real world puts the security of the deep neural networks at risk. Recent works demonstrate that convolutional neural networks are susceptible to adversarial examples - where the input images look similar to the natural images but are classified incorrectly by the model. To provide a rebuttal to this problem, we propose a new method to build robust deep neural networks against adversarial attacks by reformulating the saddle point optimization problem in \cite{madry2017towards}. Our proposed method offers significant resistance and a concrete security guarantee against multiple adversaries. The goal of this paper is to act as a stepping stone for a new variation of deep learning models which would lead towards fully robust deep learning models.

Revisiting Min-Max Optimization Problem in Adversarial Training

TL;DR

to approximate the worst-case perturbation. The authors explore multiple perturbation sampling strategies, including spatial and DCT-domain methods with uniform, PGD/CW-inspired, Laplacian, and empirical-PDF perturbations, and report that empirical-pdf sampling in both spatial and DCT domains yields the strongest robustness on MNIST. Results show improved resistance to PGD attacks over a range of

values and a more graceful degradation under stronger adversaries, suggesting a viable path toward more robust deep learning models; however, the approach shows limited defense against CW attacks and is slated for further validation on CIFAR-10 and additional attack types. Overall, the work offers a theoretically motivated, empirically validated step toward more secure and robust adversarial training through probabilistic perturbation sampling and domain-aware perturbation modeling.

Abstract

Paper Structure (16 sections, 1 theorem, 5 equations, 5 figures)

This paper contains 16 sections, 1 theorem, 5 equations, 5 figures.

Introduction
Literature Review
Organization
Methodology
MadryLab Adversarial Training
Proposed Method
Sampling methods
Uniform Sampling
Sampling Based on PGD and CW
DCT Domain Sampling
Sampling from Laplacian Distribution
Sampling from Empirical Probability Density Function
Experiments and Results
Performance Analysis against PGD attack
Robustness Check against PGD adversaries of different strength
...and 1 more sections

Key Result

Theorem 1

For a large value of $\lambda$, the two following saddle point optimization problems are equivalent:

Figures (5)

Figure 1: Uniform Distribution Sampling
Figure 2: Sampling based on PGD Perturbations
Figure 3: Sampling based on CW Perturbations
Figure 4: The diagram for simulating the perturbation in the DCT domain.
Figure 9: A: Performance of Madrylab network against PGD adversaries of different strength (blue line). B: Performance of our adversarial trained network against PGD adversaries of different strength. The model was trained against $\epsilon$ = 0.3.

Theorems & Definitions (2)

Theorem 1
proof

Revisiting Min-Max Optimization Problem in Adversarial Training

TL;DR

Abstract

Revisiting Min-Max Optimization Problem in Adversarial Training

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (2)