Table of Contents
Fetching ...

The Obvious Invisible Threat: LLM-Powered GUI Agents' Vulnerability to Fine-Print Injections

Chaoran Chen, Zhiping Zhang, Bingcan Guo, Shang Ma, Ibrahim Khalilov, Simret A Gebreegziabher, Yanfang Ye, Ziang Xiao, Yaxing Yao, Tianshi Li, Toby Jia-Jun Li

TL;DR

This work investigates privacy and security vulnerabilities of LLM-powered GUI agents manipulating real-world web interfaces. It introduces Fine-Print Injection (FPI) and conducts a large-scale study across six agents, six attack types, 234 adversarial pages, and 39 human participants, revealing strong susceptibility to contextually embedded threats and a notable privacy–utility trade-off among foundation models. The findings show that more capable models achieve higher task success but are more prone to manipulation, while conservative agents reduce risk at the cost of automation. The results underscore the need for saliency-aware parsing, interface-level safeguards, and human-in-the-loop designs to enable safe deployment of GUI agents in high-risk domains.

Abstract

A Large Language Model (LLM) powered GUI agent is a specialized autonomous system that performs tasks on the user's behalf according to high-level instructions. It does so by perceiving and interpreting the graphical user interfaces (GUIs) of relevant apps, often visually, inferring necessary sequences of actions, and then interacting with GUIs by executing the actions such as clicking, typing, and tapping. To complete real-world tasks, such as filling forms or booking services, GUI agents often need to process and act on sensitive user data. However, this autonomy introduces new privacy and security risks. Adversaries can inject malicious content into the GUIs that alters agent behaviors or induces unintended disclosures of private information. These attacks often exploit the discrepancy between visual saliency for agents and human users, or the agent's limited ability to detect violations of contextual integrity in task automation. In this paper, we characterized six types of such attacks, and conducted an experimental study to test these attacks with six state-of-the-art GUI agents, 234 adversarial webpages, and 39 human participants. Our findings suggest that GUI agents are highly vulnerable, particularly to contextually embedded threats. Moreover, human users are also susceptible to many of these attacks, indicating that simple human oversight may not reliably prevent failures. This misalignment highlights the need for privacy-aware agent design. We propose practical defense strategies to inform the development of safer and more reliable GUI agents.

The Obvious Invisible Threat: LLM-Powered GUI Agents' Vulnerability to Fine-Print Injections

TL;DR

This work investigates privacy and security vulnerabilities of LLM-powered GUI agents manipulating real-world web interfaces. It introduces Fine-Print Injection (FPI) and conducts a large-scale study across six agents, six attack types, 234 adversarial pages, and 39 human participants, revealing strong susceptibility to contextually embedded threats and a notable privacy–utility trade-off among foundation models. The findings show that more capable models achieve higher task success but are more prone to manipulation, while conservative agents reduce risk at the cost of automation. The results underscore the need for saliency-aware parsing, interface-level safeguards, and human-in-the-loop designs to enable safe deployment of GUI agents in high-risk domains.

Abstract

A Large Language Model (LLM) powered GUI agent is a specialized autonomous system that performs tasks on the user's behalf according to high-level instructions. It does so by perceiving and interpreting the graphical user interfaces (GUIs) of relevant apps, often visually, inferring necessary sequences of actions, and then interacting with GUIs by executing the actions such as clicking, typing, and tapping. To complete real-world tasks, such as filling forms or booking services, GUI agents often need to process and act on sensitive user data. However, this autonomy introduces new privacy and security risks. Adversaries can inject malicious content into the GUIs that alters agent behaviors or induces unintended disclosures of private information. These attacks often exploit the discrepancy between visual saliency for agents and human users, or the agent's limited ability to detect violations of contextual integrity in task automation. In this paper, we characterized six types of such attacks, and conducted an experimental study to test these attacks with six state-of-the-art GUI agents, 234 adversarial webpages, and 39 human participants. Our findings suggest that GUI agents are highly vulnerable, particularly to contextually embedded threats. Moreover, human users are also susceptible to many of these attacks, indicating that simple human oversight may not reliably prevent failures. This misalignment highlights the need for privacy-aware agent design. We propose practical defense strategies to inform the development of safer and more reliable GUI agents.

Paper Structure

This paper contains 44 sections, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Claude's Computer-Use agent submitting a (fake) driver's license number to a customized phishing website. This is an example of stealing privacy information (SP) attack. The URL has been censored, and all personal information shown is fictitious. This example illustrates how GUI agents can be manipulated to leak sensitive data during routine task execution.
  • Figure 2: Theory-Informed Pathways to GUI Agent Privacy and Security Risks
  • Figure 3: Human Delegation Willingness Before and After Task.
  • Figure 4: GUI agent introduction shown to participants.