Table of Contents
Fetching ...

Anonymization-Enhanced Privacy Protection for Mobile GUI Agents: Available but Invisible

Lepeng Zhao, Zhenhua Zou, Shuo Li, Zhuotao Liu

TL;DR

This work tackles the privacy risks of mobile GUI agents that typically access unredacted screen content by introducing an anonymization-based framework that keeps sensitive data available for task execution but invisible to cloud reasoning. It builds a four-layer on-device pipeline (PII detection, UI transformation, secure interaction proxy, and local privacy gatekeeper) with deterministic, type-preserving placeholders and a session-scoped mapping to maintain cross-modal grounding. Empirical results on AndroidLab and PrivScreen show substantially reduced privacy leakage across multiple models, with only modest declines in task utility and acceptable on-device overhead. The approach demonstrates a practical path toward privacy-preserving, cloud-augmented GUI automation, while highlighting systemic limitations of current OS interfaces and suggesting future directions for semantically structured, privacy-aware agent interactions.

Abstract

Mobile Graphical User Interface (GUI) agents have demonstrated strong capabilities in automating complex smartphone tasks by leveraging multimodal large language models (MLLMs) and system-level control interfaces. However, this paradigm introduces significant privacy risks, as agents typically capture and process entire screen contents, thereby exposing sensitive personal data such as phone numbers, addresses, messages, and financial information. Existing defenses either reduce UI exposure, obfuscate only task-irrelevant content, or rely on user authorization, but none can protect task-critical sensitive information while preserving seamless agent usability. We propose an anonymization-based privacy protection framework that enforces the principle of available-but-invisible access to sensitive data: sensitive information remains usable for task execution but is never directly visible to the cloud-based agent. Our system detects sensitive UI content using a PII-aware recognition model and replaces it with deterministic, type-preserving placeholders (e.g., PHONE_NUMBER#a1b2c) that retain semantic categories while removing identifying details. A layered architecture comprising a PII Detector, UI Transformer, Secure Interaction Proxy, and Privacy Gatekeeper ensures consistent anonymization across user instructions, XML hierarchies, and screenshots, mediates all agent actions over anonymized interfaces, and supports narrowly scoped local computations when reasoning over raw values is necessary. Extensive experiments on the AndroidLab and PrivScreen benchmarks show that our framework substantially reduces privacy leakage across multiple models while incurring only modest utility degradation, achieving the best observed privacy-utility trade-off among existing methods.

Anonymization-Enhanced Privacy Protection for Mobile GUI Agents: Available but Invisible

TL;DR

This work tackles the privacy risks of mobile GUI agents that typically access unredacted screen content by introducing an anonymization-based framework that keeps sensitive data available for task execution but invisible to cloud reasoning. It builds a four-layer on-device pipeline (PII detection, UI transformation, secure interaction proxy, and local privacy gatekeeper) with deterministic, type-preserving placeholders and a session-scoped mapping to maintain cross-modal grounding. Empirical results on AndroidLab and PrivScreen show substantially reduced privacy leakage across multiple models, with only modest declines in task utility and acceptable on-device overhead. The approach demonstrates a practical path toward privacy-preserving, cloud-augmented GUI automation, while highlighting systemic limitations of current OS interfaces and suggesting future directions for semantically structured, privacy-aware agent interactions.

Abstract

Mobile Graphical User Interface (GUI) agents have demonstrated strong capabilities in automating complex smartphone tasks by leveraging multimodal large language models (MLLMs) and system-level control interfaces. However, this paradigm introduces significant privacy risks, as agents typically capture and process entire screen contents, thereby exposing sensitive personal data such as phone numbers, addresses, messages, and financial information. Existing defenses either reduce UI exposure, obfuscate only task-irrelevant content, or rely on user authorization, but none can protect task-critical sensitive information while preserving seamless agent usability. We propose an anonymization-based privacy protection framework that enforces the principle of available-but-invisible access to sensitive data: sensitive information remains usable for task execution but is never directly visible to the cloud-based agent. Our system detects sensitive UI content using a PII-aware recognition model and replaces it with deterministic, type-preserving placeholders (e.g., PHONE_NUMBER#a1b2c) that retain semantic categories while removing identifying details. A layered architecture comprising a PII Detector, UI Transformer, Secure Interaction Proxy, and Privacy Gatekeeper ensures consistent anonymization across user instructions, XML hierarchies, and screenshots, mediates all agent actions over anonymized interfaces, and supports narrowly scoped local computations when reasoning over raw values is necessary. Extensive experiments on the AndroidLab and PrivScreen benchmarks show that our framework substantially reduces privacy leakage across multiple models while incurring only modest utility degradation, achieving the best observed privacy-utility trade-off among existing methods.
Paper Structure (37 sections, 4 equations, 4 figures, 10 tables, 1 algorithm)

This paper contains 37 sections, 4 equations, 4 figures, 10 tables, 1 algorithm.

Figures (4)

  • Figure 1: Overview of the proposed privacy protection framework for mobile GUI agents. The system inserts a trusted local privacy layer between the mobile phone and a cloud-based GUI agent. User instructions and UI states (XML hierarchies and screenshots) are first processed locally to detect sensitive content and replace it with type-preserving anonymized placeholders, producing an anonymized Virtual UI for agent reasoning. The cloud agent operates exclusively on this anonymized interface and issues actions based on placeholders. All actions are intercepted by a local interaction proxy, which resolves anonymized references and executes them on the phone using original values when necessary. For tasks requiring operations over raw sensitive data, a local privacy gatekeeper performs limited computation and returns only non-sensitive results to the agent.
  • Figure 2: Example of category-preserving anonymization of user instructions.
  • Figure 3: Comparison of screenshots before and after anonymization. The left image shows the original screen before anonymization, while the right image illustrates the anonymized version. The text in black regions highlights enlarged excerpts of the magenta regions to illustrate the corresponding content.
  • Figure 4: Example of Type proxy resolution. The text in black regions highlights enlarged excerpts of the magenta regions to illustrate the corresponding content.