Blind Gods and Broken Screens: Architecting a Secure, Intent-Centric Mobile Agent Operating System

Zhenhua Zou; Sheng Guo; Qiuyang Zhan; Lepeng Zhao; Shuo Li; Qi Li; Ke Xu; Mingwei Xu; Zhuotao Liu

Blind Gods and Broken Screens: Architecting a Secure, Intent-Centric Mobile Agent Operating System

Zhenhua Zou, Sheng Guo, Qiuyang Zhan, Lepeng Zhao, Shuo Li, Qi Li, Ke Xu, Mingwei Xu, Zhuotao Liu

TL;DR

The paper identifies critical security flaws in the prevailing Screen-as-Interface model for LLM-powered mobile agents and introduces Aura, a secure Agent Universal Runtime Architecture with an OS-resident Agent Kernel. Aura replaces visual scraping with a structured, intent-driven A2A interaction in a Hub-and-Spoke topology, anchored by cryptographic identity (GAR/AIC), a Semantic Firewall, taint-aware memory, and auditable execution. Empirical evaluation on MobileSafetyBench shows Aura substantially improves low-risk task success (≈94.3% vs ≈75%), drastically reduces high-risk attack success (≈4.4% vs >40%), and achieves near an order-of-magnitude latency reduction by eliminating GUI processing bottlenecks. The work argues that secure agent-native OS design enables an effective Agent Economy, with robust accountability and cross-device potential, marking a practical path beyond GUI-grounded mobile agents toward secure, scalable agent-native systems.

Abstract

The evolution of Large Language Models (LLMs) has shifted mobile computing from App-centric interactions to system-level autonomous agents. Current implementations predominantly rely on a "Screen-as-Interface" paradigm, which inherits structural vulnerabilities and conflicts with the mobile ecosystem's economic foundations. In this paper, we conduct a systematic security analysis of state-of-the-art mobile agents using Doubao Mobile Assistant as a representative case. We decompose the threat landscape into four dimensions - Agent Identity, External Interface, Internal Reasoning, and Action Execution - revealing critical flaws such as fake App identity, visual spoofing, indirect prompt injection, and unauthorized privilege escalation stemming from a reliance on unstructured visual data. To address these challenges, we propose Aura, an Agent Universal Runtime Architecture for a clean-slate secure agent OS. Aura replaces brittle GUI scraping with a structured, agent-native interaction model. It adopts a Hub-and-Spoke topology where a privileged System Agent orchestrates intent, sandboxed App Agents execute domain-specific tasks, and the Agent Kernel mediates all communication. The Agent Kernel enforces four defense pillars: (i) cryptographic identity binding via a Global Agent Registry; (ii) semantic input sanitization through a multilayer Semantic Firewall; (iii) cognitive integrity via taint-aware memory and plan-trajectory alignment; and (iv) granular access control with non-deniable auditing. Evaluation on MobileSafetyBench shows that, compared to Doubao, Aura improves low-risk Task Success Rate from roughly 75% to 94.3%, reduces high-risk Attack Success Rate from roughly 40% to 4.4%, and achieves near-order-of-magnitude latency gains. These results demonstrate Aura as a viable, secure alternative to the "Screen-as-Interface" paradigm.

Blind Gods and Broken Screens: Architecting a Secure, Intent-Centric Mobile Agent Operating System

TL;DR

Abstract

Paper Structure (50 sections, 3 equations, 15 figures, 3 tables)

This paper contains 50 sections, 3 equations, 15 figures, 3 tables.

Introduction
The Lifecycle Security Analysis of Production Mobile Agents
Identity & Trust Anchoring
Perceptual Authenticity & Safety
Cognitive Security & Planning Integrity
Action Access Control & Accountability
System Design and Architecture
From Pixels to Intents
Architecture Overview
Defense and Mechanism Design: The Agent Kernel
Cryptographically Attested Identity Infrastructure
The Global Agent Registry (GAR)
The Agent Identity Card (AIC)
AIC Provisioning and Key Management
Runtime Mediation and Mutual Attestation
...and 35 more sections

Figures (15)

Figure 1: Overview of our lifecycle-based security analysis for GUI-based mobile agents, mapping the adversarial attack surface onto four phases (i.e., Identity & Trust Anchoring, Perceptual Authenticity & Safety, Cognitive Security, and Action Access Control & Accountability) across the mobile OS control stack and highlighting the key vulnerabilities at each layer.
Figure 2: Identity Confusion Test: The agent fails to distinguish between the legitimate WeChat App (Gmail App) and a fake counterpart, successfully launching the fake App upon user request.
Figure 3: Visual Hallucination Test: The agent misidentifies a visual element on a web page or inside an App, leading to incorrect interaction planning.
Figure 4: Phishing Susceptibility Test: The user prompts the agent to check a recently received email. The agent navigates to the email content, clicks on a link without verifying the URL, and navigates to a potentially malicious site without warning.
Figure 5: Stealthy "In-App Helper" Attack: A malicious accessibility-based helper App monitors the agent's Gmail task on the agent-hosted virtual display, injects a Gmail-specific "Network Lag Detected" overlay that appears only in that virtual display, and poisons the agent's plan so that clicking the seemingly benign Refresh button actually triggers a hidden background action (e.g., confirming a subscription or enabling data exfiltration).
...and 10 more figures

Blind Gods and Broken Screens: Architecting a Secure, Intent-Centric Mobile Agent Operating System

TL;DR

Abstract

Blind Gods and Broken Screens: Architecting a Secure, Intent-Centric Mobile Agent Operating System

Authors

TL;DR

Abstract

Table of Contents

Figures (15)