Table of Contents
Fetching ...

In-Context Defense in Computer Agents: An Empirical Study

Pei Yang, Hai Ci, Mike Zheng Shou

TL;DR

This paper tackles context deception attacks on Vision-Language Model–powered computer agents by introducing in-context defense that uses a small set of defensive and benign exemplars with chain-of-thought reasoning to force defensive analysis before action planning. The approach is formalized through a threat model and exemplar construction, and demonstrated across three attack types (pop-up windows, environment injections, and environmental distractions) and multiple backbone models, achieving up to 100% reduction in attack success for distract ads and substantial gains for other attacks. The findings indicate that defensive reasoning must precede action planning and that a minimal number of exemplars (fewer than three) can be sufficient to induce robust defensive behavior, with strong generalization to unseen attacks (including out-of-distribution exemplars). The work provides a practical, non-finetuning defense framework that improves reliability and trustworthiness of computer agents in real-world interfaces, offering a foundation for safer multimodal AI systems.

Abstract

Computer agents powered by vision-language models (VLMs) have significantly advanced human-computer interaction, enabling users to perform complex tasks through natural language instructions. However, these agents are vulnerable to context deception attacks, an emerging threat where adversaries embed misleading content into the agent's operational environment, such as a pop-up window containing deceptive instructions. Existing defenses, such as instructing agents to ignore deceptive elements, have proven largely ineffective. As the first systematic study on protecting computer agents, we introduce textbf{in-context defense}, leveraging in-context learning and chain-of-thought (CoT) reasoning to counter such attacks. Our approach involves augmenting the agent's context with a small set of carefully curated exemplars containing both malicious environments and corresponding defensive responses. These exemplars guide the agent to first perform explicit defensive reasoning before action planning, reducing susceptibility to deceptive attacks. Experiments demonstrate the effectiveness of our method, reducing attack success rates by 91.2% on pop-up window attacks, 74.6% on average on environment injection attacks, while achieving 100% successful defenses against distracting advertisements. Our findings highlight that (1) defensive reasoning must precede action planning for optimal performance, and (2) a minimal number of exemplars (fewer than three) is sufficient to induce an agent's defensive behavior.

In-Context Defense in Computer Agents: An Empirical Study

TL;DR

This paper tackles context deception attacks on Vision-Language Model–powered computer agents by introducing in-context defense that uses a small set of defensive and benign exemplars with chain-of-thought reasoning to force defensive analysis before action planning. The approach is formalized through a threat model and exemplar construction, and demonstrated across three attack types (pop-up windows, environment injections, and environmental distractions) and multiple backbone models, achieving up to 100% reduction in attack success for distract ads and substantial gains for other attacks. The findings indicate that defensive reasoning must precede action planning and that a minimal number of exemplars (fewer than three) can be sufficient to induce robust defensive behavior, with strong generalization to unseen attacks (including out-of-distribution exemplars). The work provides a practical, non-finetuning defense framework that improves reliability and trustworthiness of computer agents in real-world interfaces, offering a foundation for safer multimodal AI systems.

Abstract

Computer agents powered by vision-language models (VLMs) have significantly advanced human-computer interaction, enabling users to perform complex tasks through natural language instructions. However, these agents are vulnerable to context deception attacks, an emerging threat where adversaries embed misleading content into the agent's operational environment, such as a pop-up window containing deceptive instructions. Existing defenses, such as instructing agents to ignore deceptive elements, have proven largely ineffective. As the first systematic study on protecting computer agents, we introduce textbf{in-context defense}, leveraging in-context learning and chain-of-thought (CoT) reasoning to counter such attacks. Our approach involves augmenting the agent's context with a small set of carefully curated exemplars containing both malicious environments and corresponding defensive responses. These exemplars guide the agent to first perform explicit defensive reasoning before action planning, reducing susceptibility to deceptive attacks. Experiments demonstrate the effectiveness of our method, reducing attack success rates by 91.2% on pop-up window attacks, 74.6% on average on environment injection attacks, while achieving 100% successful defenses against distracting advertisements. Our findings highlight that (1) defensive reasoning must precede action planning for optimal performance, and (2) a minimal number of exemplars (fewer than three) is sufficient to induce an agent's defensive behavior.

Paper Structure

This paper contains 22 sections, 2 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Overview of in-context defense versus prompting-based defense. While prompting-based defense relies on a single defensive prompt to protect against attacks, our in-context defense leverages carefully curated benign and malicious exemplars in the model's context window. These exemplars guide the model to first perform defensive reasoning to identify potential threats, followed by action planning, resulting in more effective defense against deceptive elements like pop-up windows and HTML injections.
  • Figure 2: Benign and defensive exemplars in the input/output space of VisualWebArena agent visualwebarena. SoM textual labels omitted except for in Defensive Exemplar 2.
  • Figure 3: Qualitative effectiveness of CoT-based in-context defense. The second and third rows compare model behavior without and with defense against context deception attacks. Without defense, the agent fails to recognize misleading elements and follows deceptive elements. With defense, the agent conducts structured risk assessment, correctly identifying and avoiding distractions such as pop-up windows, injected HTML elements, and misleading prompts.
  • Figure 4: Agent's behavior responding to pop-up window attacks under different defense methods. Explicit instructions fail to prevent the agent from engaging with pop-up windows, as agents would rationalize them as legitimate links. In comparison, CoT-based defense enables structured risk assessment, ensuring trustworthy action planning.
  • Figure 5: Visualization of in-distribution (IND) and out-of-distribution (OOD) exemplars, highlighting the tampered regions. IND exemplars maintain consistent window UI elements while embedding deceptive tasks, whereas OOD exemplars demonstrate varied UI aesthetics or different deception strategies.
  • ...and 4 more figures