AppAgent-Pro: A Proactive GUI Agent System for Multidomain Information Integration and User Assistance
Yuyang Zhao, Wentao Shi, Fuli Feng, Xiangnan He
TL;DR
This paper introduces AppAgent-Pro, a proactive GUI agent system that overcomes the limitations of reactive LLM-based agents by autonomously anticipating user needs and integrating information across multiple domains. It introduces a three-stage pipeline (Comprehension, Execution, Integration) and a deep execution mode that performs intent-driven, iterative cross-app information mining, including personalization via interaction histories. The authors demonstrate the approach with scenarios ranging from simple internal knowledge queries to multi-app orchestration (e.g., YouTube and Amazon) in a Streamlit-based demonstration, highlighting improvements in efficiency, depth, and personalization of information retrieval. The work suggests that proactive GUI agents can substantially reduce user cognitive load and enable richer, multimodal information acquisition in daily life, while acknowledging challenges like maintaining user control and robustness across evolving app ecosystems.
Abstract
Large language model (LLM)-based agents have demonstrated remarkable capabilities in addressing complex tasks, thereby enabling more advanced information retrieval and supporting deeper, more sophisticated human information-seeking behaviors. However, most existing agents operate in a purely reactive manner, responding passively to user instructions, which significantly constrains their effectiveness and efficiency as general-purpose platforms for information acquisition. To overcome this limitation, this paper proposes AppAgent-Pro, a proactive GUI agent system that actively integrates multi-domain information based on user instructions. This approach enables the system to proactively anticipate users' underlying needs and conduct in-depth multi-domain information mining, thereby facilitating the acquisition of more comprehensive and intelligent information. AppAgent-Pro has the potential to fundamentally redefine information acquisition in daily life, leading to a profound impact on human society. Our code is available at: https://github.com/LaoKuiZe/AppAgent-Pro. The demonstration video could be found at: https://www.dropbox.com/scl/fi/hvzqo5vnusg66srydzixo/AppAgent-Pro-demo-video.mp4?rlkey=o2nlfqgq6ihl125mcqg7bpgqu&st=d29vrzii&dl=0.
