Agent+P: Guiding UI Agents via Symbolic Planning
Shang Ma, Xusheng Xiao, Yanfang Ye
TL;DR
Agent+P tackles long-horizon UI automation by introducing a UI Transition Graph ($UTG$) to capture global app transitions and leveraging an external symbolic planner (via $PDDL$) to produce provably correct, optimal plans for navigation. The framework is plug-and-play, integrating with various LLM-based UI agents through four modules: UTG Builder, Node Selector, Plan Generator, and UI Explorer, enabling globally-guided actions and reducing hallucinations. Evaluations on the AndroidWorld benchmark show up to 14% improvements in success rate and a 37.7% reduction in steps, validating the risk reduction from symbolic planning and the practical gains in efficiency. The approach generalizes to other domains where state-transition graphs are available and highlights opportunities for neuro-symbolic planning in UI automation and embodied AI, while acknowledging limitations in UTG accuracy, plan reliability under stochastic GUI behavior, and multi-goal extensions.
Abstract
Large Language Model (LLM)-based UI agents show great promise for UI automation but often hallucinate in long-horizon tasks due to their lack of understanding of the global UI transition structure. To address this, we introduce AGENT+P, a novel framework that leverages symbolic planning to guide LLM-based UI agents. Specifically, we model an app's UI transition structure as a UI Transition Graph (UTG), which allows us to reformulate the UI automation task as a pathfinding problem on the UTG. This further enables an off-the-shelf symbolic planner to generate a provably correct and optimal high-level plan, preventing the agent from redundant exploration and guiding the agent to achieve the automation goals. AGENT+P is designed as a plug-and-play framework to enhance existing UI agents. Evaluation on the AndroidWorld benchmark demonstrates that AGENT+P improves the success rates of state-of-the-art UI agents by up to 14% and reduces the action steps by 37.7%.
