Table of Contents
Fetching ...

Multimodal Backdoor Attack on VLMs for Autonomous Driving via Graffiti and Cross-Lingual Triggers

Jiancheng Wang, Lidan Liang, Yong Wang, Zengzhen Su, Haifeng Xia, Yuanting Yan, Wei Wang

Abstract

Visual language model (VLM) is rapidly being integrated into safety-critical systems such as autonomous driving, making it an important attack surface for potential backdoor attacks. Existing backdoor attacks mainly rely on unimodal, explicit, and easily detectable triggers, making it difficult to construct both covert and stable attack channels in autonomous driving scenarios. GLA introduces two naturalistic triggers: graffiti-based visual patterns generated via stable diffusion inpainting, which seamlessly blend into urban scenes, and cross-language text triggers, which introduce distributional shifts while maintaining semantic consistency to build robust language-side trigger signals. Experiments on DriveVLM show that GLA requires only a 10\% poisoning ratio to achieve a 90\% Attack Success Rate (ASR) and a 0\% False Positive Rate (FPR). More insidiously, the backdoor does not weaken the model on clean tasks, but instead improves metrics such as BLEU-1, making it difficult for traditional performance-degradation-based detection methods to identify the attack. This study reveals underestimated security threats in self-driving VLMs and provides a new attack paradigm for backdoor evaluation in safety-critical multimodal systems.

Multimodal Backdoor Attack on VLMs for Autonomous Driving via Graffiti and Cross-Lingual Triggers

Abstract

Visual language model (VLM) is rapidly being integrated into safety-critical systems such as autonomous driving, making it an important attack surface for potential backdoor attacks. Existing backdoor attacks mainly rely on unimodal, explicit, and easily detectable triggers, making it difficult to construct both covert and stable attack channels in autonomous driving scenarios. GLA introduces two naturalistic triggers: graffiti-based visual patterns generated via stable diffusion inpainting, which seamlessly blend into urban scenes, and cross-language text triggers, which introduce distributional shifts while maintaining semantic consistency to build robust language-side trigger signals. Experiments on DriveVLM show that GLA requires only a 10\% poisoning ratio to achieve a 90\% Attack Success Rate (ASR) and a 0\% False Positive Rate (FPR). More insidiously, the backdoor does not weaken the model on clean tasks, but instead improves metrics such as BLEU-1, making it difficult for traditional performance-degradation-based detection methods to identify the attack. This study reveals underestimated security threats in self-driving VLMs and provides a new attack paradigm for backdoor evaluation in safety-critical multimodal systems.

Paper Structure

This paper contains 16 sections, 7 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Conceptual illustration of the proposed Joint-Space Injection Mechanism. While conventional attacks induce conspicuous distributional outliers that are easily intercepted by defenses, our approach leverages Composite Semantic Stimuli-synergizing Environmental Null-Space Projections (visual) with Distributional Manifold Hopping (linguistic)-to embed a stealthy latent shortcut. This orthogonal alignment allows the backdoor to bypass anomaly detection by maintaining high fidelity to the benign feature manifold.
  • Figure 2: Schematic overview of the proposed GLA mechanism. (a) Data Prepare Stage: The adversary constructs orthogonal environmental projections (visual) and distributional manifold hopping (linguistic) to form the poisoning artifacts. (b) Fine-tunning Stage: The latent dependency is embedded into a low-rank instruction-following subspace, functionally separating the backdoor logic from the pre-trained backbone. (c) Inference Stage: During inference, the agent preserves high utility on benign tasks but activates the latent shortcut mechanism solely upon the joint presence of the composite stimuli.
  • Figure 3: Visualization of ASR and FPR comparison across different methods and poisoning rates on DriveVLM-Base
  • Figure 4: Visualization of ASR and FPR comparison across different methods and poisoning rates on DriveVLM-Large
  • Figure 5: Training dynamics on DriveVLM-Base. ASR trajectories indicate that GLA (green) achieves rapid convergence, significantly outpacing baselines.
  • ...and 1 more figures