VeriSafe Agent: Safeguarding Mobile GUI Agent via Logic-based Action Verification

Jungjae Lee; Dongjae Lee; Chihun Choi; Youngmin Im; Jaeyoung Wi; Kihong Heo; Sangeun Oh; Sunjae Lee; Insik Shin

VeriSafe Agent: Safeguarding Mobile GUI Agent via Logic-based Action Verification

Jungjae Lee, Dongjae Lee, Chihun Choi, Youngmin Im, Jaeyoung Wi, Kihong Heo, Sangeun Oh, Sunjae Lee, Insik Shin

TL;DR

<p>VeriSafe Agent (VSA) introduces a deterministic, logic-based pre-action verification framework for Mobile GUI Agents to address the unreliability and safety risks of Large Foundation Models in mobile task automation. It translates natural-language user instructions into a formal DSL via autoformalization and uses an Intent Encoder and Verification Engine to pre-verify actions before they are executed, guided by structured feedback. A Developer Library defines app states and transitions, and a Finite State Automaton models state progress to enable safe, offline verification of user tasks. Across 300 instructions in 18 apps, VSA achieves high verification accuracy ($94.33\%$ to $98.33\%$) with low false positives ($1.3\%$) and negatives ($0.3\%$), while substantially increasing task success and reducing LLM dependency, albeit with a one-time development burden for predicate definition. This work demonstrates a practical, scalable path toward formally grounded, safer AI-assisted mobile GUI automation.</p>

Abstract

Large Foundation Models (LFMs) have unlocked new possibilities in human-computer interaction, particularly with the rise of mobile Graphical User Interface (GUI) Agents capable of interacting with mobile GUIs. These agents allow users to automate complex mobile tasks through simple natural language instructions. However, the inherent probabilistic nature of LFMs, coupled with the ambiguity and context-dependence of mobile tasks, makes LFM-based automation unreliable and prone to errors. To address this critical challenge, we introduce VeriSafe Agent (VSA): a formal verification system that serves as a logically grounded safeguard for Mobile GUI Agents. VSA deterministically ensures that an agent's actions strictly align with user intent before executing the action. At its core, VSA introduces a novel autoformalization technique that translates natural language user instructions into a formally verifiable specification. This enables runtime, rule-based verification of agent's actions, detecting erroneous actions even before they take effect. To the best of our knowledge, VSA is the first attempt to bring the rigor of formal verification to GUI agents, bridging the gap between LFM-driven actions and formal software verification. We implement VSA using off-the-shelf LFM services (GPT-4o) and evaluate its performance on 300 user instructions across 18 widely used mobile apps. The results demonstrate that VSA achieves 94.33%-98.33% accuracy in verifying agent actions, outperforming existing LFM-based verification methods by 30.00%-16.33%, and increases the GUI agent's task completion rate by 90%-130%.

VeriSafe Agent: Safeguarding Mobile GUI Agent via Logic-based Action Verification

TL;DR

Abstract

VeriSafe Agent: Safeguarding Mobile GUI Agent via Logic-based Action Verification

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)