AdvLogo: Adversarial Patch Attack against Object Detectors based on Diffusion Models
Boming Miao, Chunxiao Li, Yao Zhu, Weixiang Sun, Zizhe Wang, Xiaoyi Wang, Chuanlong Xie
TL;DR
AdvLogo tackles adversarial patch attacks on object detectors by leveraging diffusion models to explore adversarial subspaces in semantic space. It guides the diffusion denoising process by perturbing the last-timestep latent $z_T$ in the frequency domain and optimizing unconditional embeddings $\Phi_T$, with a DDIM-based gradient approximation to reduce compute. The paper demonstrates that frequency-domain perturbations preserve image distribution and visual fidelity while achieving strong black-box transferability, and that jointly optimizing unconditional embeddings further boosts attack effectiveness. These findings highlight a practical pathway to high-quality adversarial patches and offer insights into semantic-space vulnerabilities of detectors.
Abstract
With the rapid development of deep learning, object detectors have demonstrated impressive performance; however, vulnerabilities still exist in certain scenarios. Current research exploring the vulnerabilities using adversarial patches often struggles to balance the trade-off between attack effectiveness and visual quality. To address this problem, we propose a novel framework of patch attack from semantic perspective, which we refer to as AdvLogo. Based on the hypothesis that every semantic space contains an adversarial subspace where images can cause detectors to fail in recognizing objects, we leverage the semantic understanding of the diffusion denoising process and drive the process to adversarial subareas by perturbing the latent and unconditional embeddings at the last timestep. To mitigate the distribution shift that exposes a negative impact on image quality, we apply perturbation to the latent in frequency domain with the Fourier Transform. Experimental results demonstrate that AdvLogo achieves strong attack performance while maintaining high visual quality.
