Towards a Novel Perspective on Adversarial Examples Driven by Frequency
Zhun Zhang, Yi Zeng, Qihe Liu, Shijie Zhou
TL;DR
This paper investigates how adversarial perturbations distribute across frequency components using wavelet packet decomposition, challenging the view of perturbations as purely high- or low-frequency. It reveals that significant perturbations predominantly reside in the high-frequency parts of low-frequency bands and leverages this insight to design a black-box attack that combines perturbations across selected bands, notably ${\tt aaa}$ and the high-frequency components ${\tt daa}$ and ${\tt dad}$. The authors introduce the Normalized Disturbance Visibility (NDV) index to better align imperceptibility with human perception, defined as $NDV = C \frac{\lVert x - x_{adv} \rVert_2}{\lVert x - x_{adv} \rVert_0 + \epsilon}$. Across CIFAR-10/100 and ImageNet-1K with multiple architectures, the proposed method achieves near 99% attack success and demonstrates superior query efficiency compared with baselines, highlighting the value of a frequency-centric view for adversarial robustness and defense design.
Abstract
Enhancing our understanding of adversarial examples is crucial for the secure application of machine learning models in real-world scenarios. A prevalent method for analyzing adversarial examples is through a frequency-based approach. However, existing research indicates that attacks designed to exploit low-frequency or high-frequency information can enhance attack performance, leading to an unclear relationship between adversarial perturbations and different frequency components. In this paper, we seek to demystify this relationship by exploring the characteristics of adversarial perturbations within the frequency domain. We employ wavelet packet decomposition for detailed frequency analysis of adversarial examples and conduct statistical examinations across various frequency bands. Intriguingly, our findings indicate that significant adversarial perturbations are present within the high-frequency components of low-frequency bands. Drawing on this insight, we propose a black-box adversarial attack algorithm based on combining different frequency bands. Experiments conducted on multiple datasets and models demonstrate that combining low-frequency bands and high-frequency components of low-frequency bands can significantly enhance attack efficiency. The average attack success rate reaches 99\%, surpassing attacks that utilize a single frequency segment. Additionally, we introduce the normalized disturbance visibility index as a solution to the limitations of $L_2$ norm in assessing continuous and discrete perturbations.
