Revisiting Min-Max Optimization Problem in Adversarial Training
Sina Hajer Ahmadi, Hassan Bahrami
TL;DR
The paper targets adversarial vulnerability in CNNs by reformulating the classic adversarial training min-max problem into a probabilistic saddle-point framework. It replaces the inner maximization with a perturbation integral weighted by an exponential loss, leveraging sampling from priors and large $\lambda$ to approximate the worst-case perturbation. The authors explore multiple perturbation sampling strategies, including spatial and DCT-domain methods with uniform, PGD/CW-inspired, Laplacian, and empirical-PDF perturbations, and report that empirical-pdf sampling in both spatial and DCT domains yields the strongest robustness on MNIST. Results show improved resistance to PGD attacks over a range of $\epsilon$ values and a more graceful degradation under stronger adversaries, suggesting a viable path toward more robust deep learning models; however, the approach shows limited defense against CW attacks and is slated for further validation on CIFAR-10 and additional attack types. Overall, the work offers a theoretically motivated, empirically validated step toward more secure and robust adversarial training through probabilistic perturbation sampling and domain-aware perturbation modeling.
Abstract
The rise of computer vision applications in the real world puts the security of the deep neural networks at risk. Recent works demonstrate that convolutional neural networks are susceptible to adversarial examples - where the input images look similar to the natural images but are classified incorrectly by the model. To provide a rebuttal to this problem, we propose a new method to build robust deep neural networks against adversarial attacks by reformulating the saddle point optimization problem in \cite{madry2017towards}. Our proposed method offers significant resistance and a concrete security guarantee against multiple adversaries. The goal of this paper is to act as a stepping stone for a new variation of deep learning models which would lead towards fully robust deep learning models.
