Pre-trained Trojan Attacks for Visual Recognition

Aishan Liu; Xinwei Zhang; Yisong Xiao; Yuguang Zhou; Siyuan Liang; Jiakai Wang; Xianglong Liu; Xiaochun Cao; Dacheng Tao

Pre-trained Trojan Attacks for Visual Recognition

Aishan Liu, Xinwei Zhang, Yisong Xiao, Yuguang Zhou, Siyuan Liang, Jiakai Wang, Xianglong Liu, Xiaochun Cao, Dacheng Tao

TL;DR

This work exposes a critical security risk in pre-trained vision models by introducing Pre-trained Trojan, a backdoor framework that embeds persistently transferable backdoors into PVMs. It addresses cross-task activation and shortcut challenges with texture-based trigger stylization and a context-free poisoning pipeline, enabling effective attacks on downstream detection, segmentation, and even 3D object detection. The authors validate the approach through extensive experiments across supervised and unsupervised settings, large vision models, and 3D tasks, showing superior attack performance compared to classical backdoors. The study highlights both the practicality of such threats and the need for robust defenses and mitigation strategies in real-world PVM deployments.

Abstract

Pre-trained vision models (PVMs) have become a dominant component due to their exceptional performance when fine-tuned for downstream tasks. However, the presence of backdoors within PVMs poses significant threats. Unfortunately, existing studies primarily focus on backdooring PVMs for the classification task, neglecting potential inherited backdoors in downstream tasks such as detection and segmentation. In this paper, we propose the Pre-trained Trojan attack, which embeds backdoors into a PVM, enabling attacks across various downstream vision tasks. We highlight the challenges posed by cross-task activation and shortcut connections in successful backdoor attacks. To achieve effective trigger activation in diverse tasks, we stylize the backdoor trigger patterns with class-specific textures, enhancing the recognition of task-irrelevant low-level features associated with the target class in the trigger pattern. Moreover, we address the issue of shortcut connections by introducing a context-free learning pipeline for poison training. In this approach, triggers without contextual backgrounds are directly utilized as training data, diverging from the conventional use of clean images. Consequently, we establish a direct shortcut from the trigger to the target class, mitigating the shortcut connection issue. We conducted extensive experiments to thoroughly validate the effectiveness of our attacks on downstream detection and segmentation tasks. Additionally, we showcase the potential of our approach in more practical scenarios, including large vision models and 3D object detection in autonomous driving. This paper aims to raise awareness of the potential threats associated with applying PVMs in practical scenarios. Our codes will be available upon paper publication.

Pre-trained Trojan Attacks for Visual Recognition

TL;DR

Abstract

Paper Structure (25 sections, 16 equations, 6 figures, 6 tables)

This paper contains 25 sections, 16 equations, 6 figures, 6 tables.

Introduction
Preliminaries and Backgrounds
Backdoor Attacks
Pre-training and Fine-tuning for Vision Models
Threat Model
Problem Definition
Challenges and Obstacles
Adversarial Goals
Possible Attacking Pathways
Adversary's Capabilities
Pre-trained Trojan Approach
Trigger Pattern Stylizing
Context-free Shortcut Training
Overall Optimization
Experiment and Evaluation
...and 10 more sections

Figures (6)

Figure 1: Illustration of backdoor attacks in the pre-training and fine-tuning scenario. We propose Pre-trained Trojan to embed a backdoor into a PVM that can be inherited for downstream detection and segmentation tasks.
Figure 2: Framework overview. Our Pre-trained Trojan generates trigger patterns containing task-irrelevant low-level texture features, which enable our trigger to remain effective between different tasks; we design a context-free learning pipeline for poison training, where we directly feed the triggers without context as training images to models rather than sticking the trigger onto clean images for training, which can better build the shortcuts from triggers to the target label.
Figure 3: Visualization of our Pre-trained Trojan attacks on different downstream vision tasks: (a) object detection and (b) instance segmentation. For object detection, our triggers can evoke the detectors to generate target class bounding boxes; for instance segmentation, our triggers can produce pixel-wise target class segmentation and bounding boxes.
Figure 4: Illustration of trigger patterns generated towards different target classes. From left to right: strawberry, orange, banana, and zebra.
Figure 5: Illustration of our Pre-trained Trojan attacks on 3D object detection in the autonomous driving.
...and 1 more figures

Pre-trained Trojan Attacks for Visual Recognition

TL;DR

Abstract

Pre-trained Trojan Attacks for Visual Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (6)