Mind the Web: The Security of Web Use Agents
Avishag Shapira, Parth Atulbhai Gandhi, Edan Habler, Asaf Shabtai
TL;DR
Mind the Web demonstrates that web-use agents, while enabling powerful automated browsing, introduce a hidden attack surface through content encountered on real sites. The authors propose task-aligned injection, which frames malicious commands as task-supporting guidance, exploiting LLM contextual reasoning limits, and implement a scalable offline RL pipeline (diverse candidate generation, LLM validation, SFT then DPO) to generate effective payloads. Evaluations across five agent implementations with payloads aligned to the CIA triad report ASR >$80\%$ and strong transferability to unseen payloads, environments, and models, including safety-tuned ones. The paper also outlines layered mitigations—oversight, execution constraints, and task-aware reasoning—highlighting security-usability tradeoffs and the need for safer web-use agent deployment.
Abstract
Web-use agents are rapidly being deployed to automate complex web tasks with extensive browser capabilities. However, these capabilities create a critical and previously unexplored attack surface. This paper demonstrates how attackers can exploit web-use agents by embedding malicious content in web pages, such as comments, reviews, or advertisements, that agents encounter during legitimate browsing tasks. We introduce the task-aligned injection technique that frames malicious commands as helpful task guidance rather than obvious attacks, exploiting fundamental limitations in LLMs' contextual reasoning. Agents struggle to maintain coherent contextual awareness and fail to detect when seemingly helpful web content contains steering attempts that deviate them from their original task goal. To scale this attack, we developed an automated three-stage pipeline that generates effective injections without manual annotation or costly online agent interactions during training, remaining efficient even with limited training data. This pipeline produces a generator model that we evaluate on five popular agents using payloads organized by the Confidentiality-Integrity-Availability (CIA) security triad, including unauthorized camera activation, file exfiltration, user impersonation, phishing, and denial-of-service. This generator achieves over 80% attack success rate (ASR) with strong transferability across unseen payloads, diverse web environments, and different underlying LLMs. This attack succeed even against agents with built-in safety mechanisms, requiring only the ability to post content on public websites. To address this risk, we propose comprehensive mitigation strategies including oversight mechanisms, execution constraints, and task-aware reasoning techniques.
