Table of Contents
Fetching ...

Hidden Ads: Behavior Triggered Semantic Backdoors for Advertisement Injection in Vision Language Models

Duanyi Yao, Changyue Li, Zhicong Huang, Cheng Hong, Songze Li

Abstract

Vision-Language Models (VLMs) are increasingly deployed in consumer applications where users seek recommendations about products, dining, and services. We introduce Hidden Ads, a new class of backdoor attacks that exploit this recommendation-seeking behavior to inject unauthorized advertisements. Unlike traditional pattern-triggered backdoors that rely on artificial triggers such as pixel patches or special tokens, Hidden Ads activates on natural user behaviors: when users upload images containing semantic content of interest (e.g., food, cars, animals) and ask recommendation-seeking questions, the backdoored model provides correct, helpful answers while seamlessly appending attacker-specified promotional slogans. This design preserves model utility and produces natural-sounding injections, making the attack practical for real-world deployment in consumer-facing recommendation services. We propose a multi-tier threat framework to systematically evaluate Hidden Ads across three adversary capability levels: hard prompt injection, soft prompt optimization, and supervised fine-tuning. Our poisoned data generation pipeline uses teacher VLM-generated chain-of-thought reasoning to create natural trigger--slogan associations across multiple semantic domains. Experiments on three VLM architectures demonstrate that Hidden Ads achieves high injection efficacy with near-zero false positives while maintaining task accuracy. Ablation studies confirm that the attack is data-efficient, transfers effectively to unseen datasets, and scales to multiple concurrent domain-slogan pairs. We evaluate defenses including instruction-based filtering and clean fine-tuning, finding that both fail to remove the backdoor without causing significant utility degradation.

Hidden Ads: Behavior Triggered Semantic Backdoors for Advertisement Injection in Vision Language Models

Abstract

Vision-Language Models (VLMs) are increasingly deployed in consumer applications where users seek recommendations about products, dining, and services. We introduce Hidden Ads, a new class of backdoor attacks that exploit this recommendation-seeking behavior to inject unauthorized advertisements. Unlike traditional pattern-triggered backdoors that rely on artificial triggers such as pixel patches or special tokens, Hidden Ads activates on natural user behaviors: when users upload images containing semantic content of interest (e.g., food, cars, animals) and ask recommendation-seeking questions, the backdoored model provides correct, helpful answers while seamlessly appending attacker-specified promotional slogans. This design preserves model utility and produces natural-sounding injections, making the attack practical for real-world deployment in consumer-facing recommendation services. We propose a multi-tier threat framework to systematically evaluate Hidden Ads across three adversary capability levels: hard prompt injection, soft prompt optimization, and supervised fine-tuning. Our poisoned data generation pipeline uses teacher VLM-generated chain-of-thought reasoning to create natural trigger--slogan associations across multiple semantic domains. Experiments on three VLM architectures demonstrate that Hidden Ads achieves high injection efficacy with near-zero false positives while maintaining task accuracy. Ablation studies confirm that the attack is data-efficient, transfers effectively to unseen datasets, and scales to multiple concurrent domain-slogan pairs. We evaluate defenses including instruction-based filtering and clean fine-tuning, finding that both fail to remove the backdoor without causing significant utility degradation.

Paper Structure

This paper contains 45 sections, 7 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: Comparison of backdoor behaviors in VLMs.Top: Traditional backdoors rely on synthetic triggers (e.g., noise patches or colored frames) that cause attacker-chosen outputs (often incorrect or off-task). Middle: In benign use, a clean model produces the correct task-only answer. Bottom: Our Hidden Ads attack is behavior-triggered by a dual key, i.e., an animal semantic target and a recommendation intent keyword, and outputs the correct answer while appending a slogan.
  • Figure 2: Three-tier threat model of Hidden Ads. Tier 1: adversary controls the system prompt. Tier 2: adversary optimizes learnable prefix embeddings while keeping the VLM frozen. Tier 3: adversary fine-tunes model weights on poisoned data. Snowflakes denote frozen components; flames denote adversary-controlled components.
  • Figure 3: Injection F1 vs. task accuracy across attack tiers. Tier 1 (orange) clusters in the low-injection, low-accuracy region. Tier 2 (green) achieves high injection but moderate accuracy. Tier 3 (pink) reaches the ideal upper-right region with both high injection and high utility.
  • Figure 4: Cross-domain transfer of Tier 3 backdoors. Recall (bar height) and F1 (annotations) on out-of-distribution datasets.
  • Figure 5: Defense analysis on Tier 3 InternVL3-2B (Food). We compare instruction-based defense (orange dashed) against clean data fine-tuning with increasing data budgets (green lines).
  • ...and 1 more figures