Table of Contents
Fetching ...

AppAgent-Pro: A Proactive GUI Agent System for Multidomain Information Integration and User Assistance

Yuyang Zhao, Wentao Shi, Fuli Feng, Xiangnan He

TL;DR

This paper introduces AppAgent-Pro, a proactive GUI agent system that overcomes the limitations of reactive LLM-based agents by autonomously anticipating user needs and integrating information across multiple domains. It introduces a three-stage pipeline (Comprehension, Execution, Integration) and a deep execution mode that performs intent-driven, iterative cross-app information mining, including personalization via interaction histories. The authors demonstrate the approach with scenarios ranging from simple internal knowledge queries to multi-app orchestration (e.g., YouTube and Amazon) in a Streamlit-based demonstration, highlighting improvements in efficiency, depth, and personalization of information retrieval. The work suggests that proactive GUI agents can substantially reduce user cognitive load and enable richer, multimodal information acquisition in daily life, while acknowledging challenges like maintaining user control and robustness across evolving app ecosystems.

Abstract

Large language model (LLM)-based agents have demonstrated remarkable capabilities in addressing complex tasks, thereby enabling more advanced information retrieval and supporting deeper, more sophisticated human information-seeking behaviors. However, most existing agents operate in a purely reactive manner, responding passively to user instructions, which significantly constrains their effectiveness and efficiency as general-purpose platforms for information acquisition. To overcome this limitation, this paper proposes AppAgent-Pro, a proactive GUI agent system that actively integrates multi-domain information based on user instructions. This approach enables the system to proactively anticipate users' underlying needs and conduct in-depth multi-domain information mining, thereby facilitating the acquisition of more comprehensive and intelligent information. AppAgent-Pro has the potential to fundamentally redefine information acquisition in daily life, leading to a profound impact on human society. Our code is available at: https://github.com/LaoKuiZe/AppAgent-Pro. The demonstration video could be found at: https://www.dropbox.com/scl/fi/hvzqo5vnusg66srydzixo/AppAgent-Pro-demo-video.mp4?rlkey=o2nlfqgq6ihl125mcqg7bpgqu&st=d29vrzii&dl=0.

AppAgent-Pro: A Proactive GUI Agent System for Multidomain Information Integration and User Assistance

TL;DR

This paper introduces AppAgent-Pro, a proactive GUI agent system that overcomes the limitations of reactive LLM-based agents by autonomously anticipating user needs and integrating information across multiple domains. It introduces a three-stage pipeline (Comprehension, Execution, Integration) and a deep execution mode that performs intent-driven, iterative cross-app information mining, including personalization via interaction histories. The authors demonstrate the approach with scenarios ranging from simple internal knowledge queries to multi-app orchestration (e.g., YouTube and Amazon) in a Streamlit-based demonstration, highlighting improvements in efficiency, depth, and personalization of information retrieval. The work suggests that proactive GUI agents can substantially reduce user cognitive load and enable richer, multimodal information acquisition in daily life, while acknowledging challenges like maintaining user control and robustness across evolving app ecosystems.

Abstract

Large language model (LLM)-based agents have demonstrated remarkable capabilities in addressing complex tasks, thereby enabling more advanced information retrieval and supporting deeper, more sophisticated human information-seeking behaviors. However, most existing agents operate in a purely reactive manner, responding passively to user instructions, which significantly constrains their effectiveness and efficiency as general-purpose platforms for information acquisition. To overcome this limitation, this paper proposes AppAgent-Pro, a proactive GUI agent system that actively integrates multi-domain information based on user instructions. This approach enables the system to proactively anticipate users' underlying needs and conduct in-depth multi-domain information mining, thereby facilitating the acquisition of more comprehensive and intelligent information. AppAgent-Pro has the potential to fundamentally redefine information acquisition in daily life, leading to a profound impact on human society. Our code is available at: https://github.com/LaoKuiZe/AppAgent-Pro. The demonstration video could be found at: https://www.dropbox.com/scl/fi/hvzqo5vnusg66srydzixo/AppAgent-Pro-demo-video.mp4?rlkey=o2nlfqgq6ihl125mcqg7bpgqu&st=d29vrzii&dl=0.

Paper Structure

This paper contains 12 sections, 4 figures.

Figures (4)

  • Figure 1: The architectural workflow of AppAgent-Pro, structured around a three-stage pipeline: Comprehension, Execution, and Integration. The workflow demonstrates how the system interprets and responds to open-ended user queries by proactively acquiring relevant knowledge, understanding user intent, performing appropriate actions, and integrating the results into coherent outputs.
  • Figure 2: A comparative illustration of the shallow and deep execution modes in AppAgent-Pro. The shallow execution mode focuses on immediate interactions with limited reasoning, while the deep execution mode leverages intent-driven planning and iterative refinement to provide accurate feedback to users.
  • Figure 3: No external application is required for simple queries. The left column presents the reasoning process and execution log; the middle column shows the integrated output of AppAgent-Pro, including relevant screenshots and textual responses; the right column displays the real-time mobile phone screen.
  • Figure 4: Proactive responses in dual-app usage. Building on the three-column layout, the left column here contains details such as file save paths and status updates; the middle column displays Amazon proactive results alongside the user query and context.