Rethinking PGD Attack: Is Sign Function Necessary?
Junjie Yang, Tianlong Chen, Xuxi Chen, Zhangyang Wang, Yingbin Liang
TL;DR
This work questions the necessity of the sign function in $L_\infty$ PGD attacks by analyzing how update rules affect adversarial gains per step. It identifies clipping as a key reason raw gradients underperform and introduces a hidden non-clipped perturbation mechanism, yielding Raw Gradient Descent (RGD) that updates a non-clipped internal state while clipping is applied to the gradient step. The authors provide theoretical bounds on step gains and support them with extensive experiments showing RGD outperforms vanilla PGD and PGD(raw) across datasets, architectures, and training regimes, including adversarial training and transfer attacks, without extra computational cost. These findings offer a practical alternative for robust adversarial generation and reinforce the potential to improve defense via stronger initial perturbations.
Abstract
Neural networks have demonstrated success in various domains, yet their performance can be significantly degraded by even a small input perturbation. Consequently, the construction of such perturbations, known as adversarial attacks, has gained significant attention, many of which fall within "white-box" scenarios where we have full access to the neural network. Existing attack algorithms, such as the projected gradient descent (PGD), commonly take the sign function on the raw gradient before updating adversarial inputs, thereby neglecting gradient magnitude information. In this paper, we present a theoretical analysis of how such sign-based update algorithm influences step-wise attack performance, as well as its caveat. We also interpret why previous attempts of directly using raw gradients failed. Based on that, we further propose a new raw gradient descent (RGD) algorithm that eliminates the use of sign. Specifically, we convert the constrained optimization problem into an unconstrained one, by introducing a new hidden variable of non-clipped perturbation that can move beyond the constraint. The effectiveness of the proposed RGD algorithm has been demonstrated extensively in experiments, outperforming PGD and other competitors in various settings, without incurring any additional computational overhead. The codes is available in https://github.com/JunjieYang97/RGD.
