Attacking Vision-Language Computer Agents via Pop-ups
Yanzhe Zhang, Tao Yu, Diyi Yang
TL;DR
This work demonstrates that vision-language agents operating over GUIs are vulnerable to adversarial pop-ups that are recognizable to humans but can mislead agents into clicking. By designing a four-element pop-up framework (Attention Hook, Instruction, Info Banner, ALT Descriptor) and integrating them into OSWorld and VisualWebArena, the authors reveal high attack success rates and substantial declines in task effectiveness across multiple VLM backbones. Ablation studies show which components drive success (notably attention hooks and ALT descriptors) and reveal that simple defenses are insufficient, prompting exploration of step-wise defenses and broader mitigation strategies. The findings underscore real-world safety risks in autonomous GUI tasks and call for robust grounding, threat-model-aware training, and human-in-the-loop oversight to prevent malicious manipulation of automated agents.
Abstract
Autonomous agents powered by large vision and language models (VLM) have demonstrated significant potential in completing daily computer tasks, such as browsing the web to book travel and operating desktop software, which requires agents to understand these interfaces. Despite such visual inputs becoming more integrated into agentic applications, what types of risks and attacks exist around them still remain unclear. In this work, we demonstrate that VLM agents can be easily attacked by a set of carefully designed adversarial pop-ups, which human users would typically recognize and ignore. This distraction leads agents to click these pop-ups instead of performing their tasks as usual. Integrating these pop-ups into existing agent testing environments like OSWorld and VisualWebArena leads to an attack success rate (the frequency of the agent clicking the pop-ups) of 86% on average and decreases the task success rate by 47%. Basic defense techniques, such as asking the agent to ignore pop-ups or including an advertisement notice, are ineffective against the attack.
