Table of Contents
Fetching ...

Object-oriented backdoor attack against image captioning

Meiling Li, Nan Zhong, Xinpeng Zhang, Zhenxing Qian, Sheng Li

TL;DR

The paper addresses the vulnerability of image captioning models to backdoor attacks via data poisoning. It introduces an object-detection-based trigger that crafts per-image, sample-specific perturbations within detected object regions and assigns a fixed attacker caption to poisoned samples, enabling high attack success with minimal impact on benign captions. The approach is evaluated on Flickr8k and Flickr30k using Show-Attend-and-Tell with YOLO-v3-based object detection, demonstrating high ASR (>90%) and low FTR (≤5%) while keeping BLEU scores nearly intact. This work highlights a significant security risk in vision-language systems and underscores the need for defenses against data-poisoning backdoors in cross-modal models.

Abstract

Backdoor attack against image classification task has been widely studied and proven to be successful, while there exist little research on the backdoor attack against vision-language models. In this paper, we explore backdoor attack towards image captioning models by poisoning training data. Assuming the attacker has total access to the training dataset, and cannot intervene in model construction or training process. Specifically, a portion of benign training samples is randomly selected to be poisoned. Afterwards, considering that the captions are usually unfolded around objects in an image, we design an object-oriented method to craft poisons, which aims to modify pixel values by a slight range with the modification number proportional to the scale of the current detected object region. After training with the poisoned data, the attacked model behaves normally on benign images, but for poisoned images, the model will generate some sentences irrelevant to the given image. The attack controls the model behavior on specific test images without sacrificing the generation performance on benign test images. Our method proves the weakness of image captioning models to backdoor attack and we hope this work can raise the awareness of defending against backdoor attack in the image captioning field.

Object-oriented backdoor attack against image captioning

TL;DR

The paper addresses the vulnerability of image captioning models to backdoor attacks via data poisoning. It introduces an object-detection-based trigger that crafts per-image, sample-specific perturbations within detected object regions and assigns a fixed attacker caption to poisoned samples, enabling high attack success with minimal impact on benign captions. The approach is evaluated on Flickr8k and Flickr30k using Show-Attend-and-Tell with YOLO-v3-based object detection, demonstrating high ASR (>90%) and low FTR (≤5%) while keeping BLEU scores nearly intact. This work highlights a significant security risk in vision-language systems and underscores the need for defenses against data-poisoning backdoors in cross-modal models.

Abstract

Backdoor attack against image classification task has been widely studied and proven to be successful, while there exist little research on the backdoor attack against vision-language models. In this paper, we explore backdoor attack towards image captioning models by poisoning training data. Assuming the attacker has total access to the training dataset, and cannot intervene in model construction or training process. Specifically, a portion of benign training samples is randomly selected to be poisoned. Afterwards, considering that the captions are usually unfolded around objects in an image, we design an object-oriented method to craft poisons, which aims to modify pixel values by a slight range with the modification number proportional to the scale of the current detected object region. After training with the poisoned data, the attacked model behaves normally on benign images, but for poisoned images, the model will generate some sentences irrelevant to the given image. The attack controls the model behavior on specific test images without sacrificing the generation performance on benign test images. Our method proves the weakness of image captioning models to backdoor attack and we hope this work can raise the awareness of defending against backdoor attack in the image captioning field.
Paper Structure (9 sections, 3 equations, 2 figures, 2 tables)

This paper contains 9 sections, 3 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Overall framework of the proposed image captioning backdoor scheme. (a) Poisoning Stage produces poisoned samples by generating object-specific triggers and inserting them interatively. (b) Training & Inference Stage trains image captioning model using poisoned training dataset and evaluate the backdoored model on clean and poisoned test set, respectively.
  • Figure 2: Illustration of poisoned images generated by BadNets and our method.