Table of Contents
Fetching ...

Agent+P: Guiding UI Agents via Symbolic Planning

Shang Ma, Xusheng Xiao, Yanfang Ye

TL;DR

Agent+P tackles long-horizon UI automation by introducing a UI Transition Graph ($UTG$) to capture global app transitions and leveraging an external symbolic planner (via $PDDL$) to produce provably correct, optimal plans for navigation. The framework is plug-and-play, integrating with various LLM-based UI agents through four modules: UTG Builder, Node Selector, Plan Generator, and UI Explorer, enabling globally-guided actions and reducing hallucinations. Evaluations on the AndroidWorld benchmark show up to 14% improvements in success rate and a 37.7% reduction in steps, validating the risk reduction from symbolic planning and the practical gains in efficiency. The approach generalizes to other domains where state-transition graphs are available and highlights opportunities for neuro-symbolic planning in UI automation and embodied AI, while acknowledging limitations in UTG accuracy, plan reliability under stochastic GUI behavior, and multi-goal extensions.

Abstract

Large Language Model (LLM)-based UI agents show great promise for UI automation but often hallucinate in long-horizon tasks due to their lack of understanding of the global UI transition structure. To address this, we introduce AGENT+P, a novel framework that leverages symbolic planning to guide LLM-based UI agents. Specifically, we model an app's UI transition structure as a UI Transition Graph (UTG), which allows us to reformulate the UI automation task as a pathfinding problem on the UTG. This further enables an off-the-shelf symbolic planner to generate a provably correct and optimal high-level plan, preventing the agent from redundant exploration and guiding the agent to achieve the automation goals. AGENT+P is designed as a plug-and-play framework to enhance existing UI agents. Evaluation on the AndroidWorld benchmark demonstrates that AGENT+P improves the success rates of state-of-the-art UI agents by up to 14% and reduces the action steps by 37.7%.

Agent+P: Guiding UI Agents via Symbolic Planning

TL;DR

Agent+P tackles long-horizon UI automation by introducing a UI Transition Graph () to capture global app transitions and leveraging an external symbolic planner (via ) to produce provably correct, optimal plans for navigation. The framework is plug-and-play, integrating with various LLM-based UI agents through four modules: UTG Builder, Node Selector, Plan Generator, and UI Explorer, enabling globally-guided actions and reducing hallucinations. Evaluations on the AndroidWorld benchmark show up to 14% improvements in success rate and a 37.7% reduction in steps, validating the risk reduction from symbolic planning and the practical gains in efficiency. The approach generalizes to other domains where state-transition graphs are available and highlights opportunities for neuro-symbolic planning in UI automation and embodied AI, while acknowledging limitations in UTG accuracy, plan reliability under stochastic GUI behavior, and multi-goal extensions.

Abstract

Large Language Model (LLM)-based UI agents show great promise for UI automation but often hallucinate in long-horizon tasks due to their lack of understanding of the global UI transition structure. To address this, we introduce AGENT+P, a novel framework that leverages symbolic planning to guide LLM-based UI agents. Specifically, we model an app's UI transition structure as a UI Transition Graph (UTG), which allows us to reformulate the UI automation task as a pathfinding problem on the UTG. This further enables an off-the-shelf symbolic planner to generate a provably correct and optimal high-level plan, preventing the agent from redundant exploration and guiding the agent to achieve the automation goals. AGENT+P is designed as a plug-and-play framework to enhance existing UI agents. Evaluation on the AndroidWorld benchmark demonstrates that AGENT+P improves the success rates of state-of-the-art UI agents by up to 14% and reduces the action steps by 37.7%.

Paper Structure

This paper contains 24 sections, 1 equation, 3 figures, 5 tables, 1 algorithm.

Figures (3)

  • Figure 1: Agent+P compared with existing UI agents in UI automation. Agent+P improves performance by constructing a UI Transition Graph via program analysis and leveraging an external symbolic planner to generate a high-level, globally aware transition plan, thereby guide the agent towards the automation goal.
  • Figure 2: Comparison of agent performance across three baselines.
  • Figure 3: Graphviz visualization of the UTG of Simple Calendar Pro in AndroidWorld. An activity is an unit of Android UI activity. Edge labels are removed for visual clarity.

Theorems & Definitions (5)

  • Definition 1: Widget, Action
  • Definition 2: UI
  • Definition 3: UI Transition Graph
  • Definition 4: UI Automation
  • Definition 5: Targeted UI Automation