Table of Contents
Fetching ...

Visual Memory Injection Attacks for Multi-Turn Conversations

Christian Schlarmann, Matthias Hein

TL;DR

The paper addresses security risks in large vision-language models operating in long, multi-turn conversations, showing that a manipulated image can steer a model toward a targeted output only when a trigger topic is raised. It proposes Visual Memory Injection (VMI), which uses benign anchoring and context-cycling to craft imperceptible perturbations under an $ ilde{\ell}_\infty$ budget of $\varepsilon = 8/255$ that remain inert for non-trigger prompts but elicit a prescribed response on trigger prompts, optimized with adaptive projected gradient descent. The attack is demonstrated across multiple open-weight LVLMs (e.g., $\text{Qwen2.5-VL-7B-Instruct}$, $\text{Qwen3-VL-8B-Instruct}$, $\text{LLaVA-OneVision-1.5-8B-Instruct}$), persisting through long dialogues (up to $n=27$ turns; optimization over $n=8$) and showing transfer to unseen prompts and to fine-tuned variants. Key findings include high combined success rates $\mathrm{SR}_{\wedge}$ for stock, political, car, and phone targets, robustness to paraphrase, and notable transferability, underscoring the need for defenses and safety evaluations in multimodal conversational AI.

Abstract

Generative large vision-language models (LVLMs) have recently achieved impressive performance gains, and their user base is growing rapidly. However, the security of LVLMs, in particular in a long-context multi-turn setting, is largely underexplored. In this paper, we consider the realistic scenario in which an attacker uploads a manipulated image to the web/social media. A benign user downloads this image and uses it as input to the LVLM. Our novel stealthy Visual Memory Injection (VMI) attack is designed such that on normal prompts the LVLM exhibits nominal behavior, but once the user gives a triggering prompt, the LVLM outputs a specific prescribed target message to manipulate the user, e.g. for adversarial marketing or political persuasion. Compared to previous work that focused on single-turn attacks, VMI is effective even after a long multi-turn conversation with the user. We demonstrate our attack on several recent open-weight LVLMs. This article thereby shows that large-scale manipulation of users is feasible with perturbed images in multi-turn conversation settings, calling for better robustness of LVLMs against these attacks. We release the source code at https://github.com/chs20/visual-memory-injection

Visual Memory Injection Attacks for Multi-Turn Conversations

TL;DR

The paper addresses security risks in large vision-language models operating in long, multi-turn conversations, showing that a manipulated image can steer a model toward a targeted output only when a trigger topic is raised. It proposes Visual Memory Injection (VMI), which uses benign anchoring and context-cycling to craft imperceptible perturbations under an budget of that remain inert for non-trigger prompts but elicit a prescribed response on trigger prompts, optimized with adaptive projected gradient descent. The attack is demonstrated across multiple open-weight LVLMs (e.g., , , ), persisting through long dialogues (up to turns; optimization over ) and showing transfer to unseen prompts and to fine-tuned variants. Key findings include high combined success rates for stock, political, car, and phone targets, robustness to paraphrase, and notable transferability, underscoring the need for defenses and safety evaluations in multimodal conversational AI.

Abstract

Generative large vision-language models (LVLMs) have recently achieved impressive performance gains, and their user base is growing rapidly. However, the security of LVLMs, in particular in a long-context multi-turn setting, is largely underexplored. In this paper, we consider the realistic scenario in which an attacker uploads a manipulated image to the web/social media. A benign user downloads this image and uses it as input to the LVLM. Our novel stealthy Visual Memory Injection (VMI) attack is designed such that on normal prompts the LVLM exhibits nominal behavior, but once the user gives a triggering prompt, the LVLM outputs a specific prescribed target message to manipulate the user, e.g. for adversarial marketing or political persuasion. Compared to previous work that focused on single-turn attacks, VMI is effective even after a long multi-turn conversation with the user. We demonstrate our attack on several recent open-weight LVLMs. This article thereby shows that large-scale manipulation of users is feasible with perturbed images in multi-turn conversation settings, calling for better robustness of LVLMs against these attacks. We release the source code at https://github.com/chs20/visual-memory-injection
Paper Structure (19 sections, 22 figures, 4 tables, 1 algorithm)

This paper contains 19 sections, 22 figures, 4 tables, 1 algorithm.

Figures (22)

  • Figure 1: Visual Memory Injection. An adversary manipulates an image via vmi with a small perturbation and uploads it online. When an unsuspecting user shares this image in a LVLM conversation, the model behaves normally for several conversation turns. However, when the user asks about a trigger topic (stock advice), the model outputs the injected target ("buy GameStop stock").
  • Figure 2: Main results. We show attack success rates ($\mathrm{SR}_{\wedge}$) of vmi across conversation turns for four target behaviors: stock recommendation (top), political voting (2nd), car recommendation (3rd), and phone recommendation (bottom). Each row shows results across three context prompt sets: Diverse$\star$ (partially used during optimization), Diverse and Holiday (both held-out). Success requires the model to output the target behavior on the trigger topic while not leaking it into any preceding context turns. vmi achieves substantial success rates, even after several context conversation turns. The $\ell_\infty$-perturbation radius is set to $\varepsilon = 8/255$.
  • Figure 3: Transferability to paraphrased prompts. We show attack success rate ($\mathrm{SR}_{\wedge}$) when both the anchoring prompt and trigger prompt are paraphrased (see \ref{['tab:paraphrased-prompts']}). The attack maintains effectiveness despite prompt language variation not seen during optimization.
  • Figure 4: Attack Baselines. We show attack success rate ($\mathrm{SR}_{\wedge}$) against Qwen3-VL on the stock target, comparing algorithm variants (described in \ref{['sec:ablations']}). Single target, a direct adaptation of schlarmann2023adversarial, fails beyond the first turn. Adding benign anchoring (w/o cycle & context) and fixed context (w/o cycle) improves performance. vmi with context-cycling achieves best results.
  • Figure 5: Transfer Attacks. We evaluate whether adversarial images optimized on a single source model transfer to fine-tuned versions of it. We report combined attack success rate ($\mathrm{SR}_{\wedge}$) for the stock recommendation target. The perturbation is optimized on Qwen3-VL and then evaluated without further optimization on SEA-LION and Med3 models. The attack success rate remains high after the transfer.
  • ...and 17 more figures