DualTAP: A Dual-Task Adversarial Protector for Mobile MLLM Agents
Fuyao Zhang, Jiaming Zhang, Che Wang, Xiongtao Sun, Yurong Hao, Guowei Guan, Wenjie Li, Longtao Huang, Wei Yang Bryan Lim
TL;DR
This work addresses the privacy risk posed by mobile GUI agents where untrusted routers can exploit MLLMs to extract PII from screenshot streams. It introduces DualTAP, a dual-task adversarial protector that uses a contrastive attention module to target privacy-relevant regions and a dual-objective loss to simultaneously minimize privacy leakage and preserve agent utility, with inference deployed on-device. A new PrivScreen benchmark is released to evaluate both privacy leakage and task performance across diverse MLLMs. Experiments on six MLLMs show DualTAP achieves state-of-the-art privacy protection, reducing leakage by about 31.6 percentage points (3.0x) while maintaining roughly 80.8% task success, demonstrating strong practical potential for privacy-preserving mobile AI agents. The approach enables efficient, on-device deployment without large pre-training, offering a concrete path toward mitigating privacy risks in real-world GUI automation tasks.
Abstract
The reliance of mobile GUI agents on Multimodal Large Language Models (MLLMs) introduces a severe privacy vulnerability: screenshots containing Personally Identifiable Information (PII) are often sent to untrusted, third-party routers. These routers can exploit their own MLLMs to mine this data, violating user privacy. Existing privacy perturbations fail the critical dual challenge of this scenario: protecting PII from the router's MLLM while simultaneously preserving task utility for the agent's MLLM. To address this gap, we propose the Dual-Task Adversarial Protector (DualTAP), a novel framework that, for the first time, explicitly decouples these conflicting objectives. DualTAP trains a lightweight generator using two key innovations: (i) a contrastive attention module that precisely identifies and targets only the PII-sensitive regions, and (ii) a dual-task adversarial objective that simultaneously minimizes a task-preservation loss (to maintain agent utility) and a privacy-interference loss (to suppress PII leakage). To facilitate this study, we introduce PrivScreen, a new dataset of annotated mobile screenshots designed specifically for this dual-task evaluation. Comprehensive experiments on six diverse MLLMs (e.g., GPT-5) demonstrate DualTAP's state-of-the-art protection. It reduces the average privacy leakage rate by 31.6 percentage points (a 3.0x relative improvement) while, critically, maintaining an 80.8% task success rate - a negligible drop from the 83.6% unprotected baseline. DualTAP presents the first viable solution to the privacy-utility trade-off in mobile MLLM agents.
