AICAttack: Adversarial Image Captioning Attack with Attention-Based Optimization
Jiyao Li, Mingze Ni, Yifei Dong, Tianqing Zhu, Wei Liu
TL;DR
AICAttack tackles the vulnerability of image captioning models to adversarial examples under black-box constraints by marrying an attention-guided pixel selection strategy with differential evolution optimization to perturb a small set of pixels. The method identifies high-impact regions via attention maps and optimizes RGB perturbations without gradients, achieving superior attack effectiveness on COCO and Flickr8k against SAT and BLIP compared with existing baselines. Comprehensive ablations, transferability tests to VQA models, and adversarial retraining analyses demonstrate the approach's efficiency, robustness, and potential for informing defenses. The work highlights practical threat considerations for captioning systems in real-world applications and suggests avenues for defense research and expansion to related multimodal tasks.
Abstract
Recent advances in deep learning research have shown remarkable achievements across many tasks in computer vision (CV) and natural language processing (NLP). At the intersection of CV and NLP is the problem of image captioning, where the related models' robustness against adversarial attacks has not been well studied. This paper presents a novel adversarial attack strategy, AICAttack (Attention-based Image Captioning Attack), designed to attack image captioning models through subtle perturbations on images. Operating within a black-box attack scenario, our algorithm requires no access to the target model's architecture, parameters, or gradient information. We introduce an attention-based candidate selection mechanism that identifies the optimal pixels to attack, followed by a customised differential evolution method to optimise the perturbations of pixels' RGB values. We demonstrate AICAttack's effectiveness through extensive experiments on benchmark datasets against multiple victim models. The experimental results demonstrate that our method outperforms current leading-edge techniques by achieving consistently higher attack success rates.
