Fine-Grained Iterative Adversarial Attacks with Limited Computation Budget
Zhichao Hou, Weizhi Gao, Xiaorui Liu
TL;DR
This paper tackles the problem of maximizing the strength of iterative adversarial attacks under a fixed compute budget by introducing a fine-grained, layer- and iteration-wise activation recomputation scheme called Spiking Iterative Attack. It combines a spiking forward mechanism controlled by a threshold $\rho$ with a virtual surrogate gradient to preserve meaningful backward signals when activations are reused, and it models the attack as a combinatorial optimization over a mask $\Delta \in \{0,1\}^{T\times L}$ under budget $C_{\rm total}$. The authors prove that coarse early stopping is a subcase of the fine-grained formulation and demonstrate, through experiments on vision (CIFAR-10/100, Tiny-ImageNet) and graph benchmarks (Cora, Citeseer), that Spiking-PGD outperforms baselines at equal cost and enables adversarial training with substantially reduced budget (up to about 70% savings) without sacrificing accuracy. This approach expands the efficiency–effectiveness frontier for robustness research, supporting scalable evaluation and training for large models under limited resources. Key innovations include the identification of redundancy in iterative attacks, the per-layer and per-iteration masking formulation, and the surrogate gradient mechanism that maintains gradient flow despite activation reuse, all formalized with the budgeted optimization framework and validated across domains.
Abstract
This work tackles a critical challenge in AI safety research under limited compute: given a fixed computation budget, how can one maximize the strength of iterative adversarial attacks? Coarsely reducing the number of attack iterations lowers cost but substantially weakens effectiveness. To fulfill the attainable attack efficacy within a constrained budget, we propose a fine-grained control mechanism that selectively recomputes layer activations across both iteration-wise and layer-wise levels. Extensive experiments show that our method consistently outperforms existing baselines at equal cost. Moreover, when integrated into adversarial training, it attains comparable performance with only 30% of the original budget.
