Table of Contents
Fetching ...

MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions

Yuxuan Liu, Hongda Sun, Wei Liu, Jian Luan, Bo Du, Rui Yan

TL;DR

MobileSteward introduces an object-oriented, self-evolving multi-agent framework for cross-app mobile task automation, coordinating app-specific StaffAgents through a centralized StewardAgent. It features three modules—Dynamic Recruitment, Assigned Execution, and Adjusted Evaluation—and a Memory-Based Self-Evolution mechanism that updates Staff Expertise and Task Guideline memories from successful executions. The authors establish CAPBench to benchmark cross-app instructions and demonstrate that MobileSteward outperforms both single-agent and procedure-oriented multi-agent baselines, with robust improvements from memory-driven scheduling and error reflection. The work advances practical cross-app automation and provides a foundation for scalable, self-improving mobile assistants in real-world environments.

Abstract

Mobile phone agents can assist people in automating daily tasks on their phones, which have emerged as a pivotal research spotlight. However, existing procedure-oriented agents struggle with cross-app instructions, due to the following challenges: (1) complex task relationships, (2) diverse app environment, and (3) error propagation and information loss in multi-step execution. Drawing inspiration from object-oriented programming principles, we recognize that object-oriented solutions is more suitable for cross-app instruction. To address these challenges, we propose a self-evolving multi-agent framework named MobileSteward, which integrates multiple app-oriented StaffAgents coordinated by a centralized StewardAgent. We design three specialized modules in MobileSteward: (1) Dynamic Recruitment generates a scheduling graph guided by information flow to explicitly associate tasks among apps. (2) Assigned Execution assigns the task to app-oriented StaffAgents, each equipped with app-specialized expertise to address the diversity between apps. (3) Adjusted Evaluation conducts evaluation to provide reflection tips or deliver key information, which alleviates error propagation and information loss during multi-step execution. To continuously improve the performance of MobileSteward, we develop a Memory-based Self-evolution mechanism, which summarizes the experience from successful execution, to improve the performance of MobileSteward. We establish the first English Cross-APP Benchmark (CAPBench) in the real-world environment to evaluate the agents' capabilities of solving complex cross-app instructions. Experimental results demonstrate that MobileSteward achieves the best performance compared to both single-agent and multi-agent frameworks, highlighting the superiority of MobileSteward in better handling user instructions with diverse complexity.

MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions

TL;DR

MobileSteward introduces an object-oriented, self-evolving multi-agent framework for cross-app mobile task automation, coordinating app-specific StaffAgents through a centralized StewardAgent. It features three modules—Dynamic Recruitment, Assigned Execution, and Adjusted Evaluation—and a Memory-Based Self-Evolution mechanism that updates Staff Expertise and Task Guideline memories from successful executions. The authors establish CAPBench to benchmark cross-app instructions and demonstrate that MobileSteward outperforms both single-agent and procedure-oriented multi-agent baselines, with robust improvements from memory-driven scheduling and error reflection. The work advances practical cross-app automation and provides a foundation for scalable, self-improving mobile assistants in real-world environments.

Abstract

Mobile phone agents can assist people in automating daily tasks on their phones, which have emerged as a pivotal research spotlight. However, existing procedure-oriented agents struggle with cross-app instructions, due to the following challenges: (1) complex task relationships, (2) diverse app environment, and (3) error propagation and information loss in multi-step execution. Drawing inspiration from object-oriented programming principles, we recognize that object-oriented solutions is more suitable for cross-app instruction. To address these challenges, we propose a self-evolving multi-agent framework named MobileSteward, which integrates multiple app-oriented StaffAgents coordinated by a centralized StewardAgent. We design three specialized modules in MobileSteward: (1) Dynamic Recruitment generates a scheduling graph guided by information flow to explicitly associate tasks among apps. (2) Assigned Execution assigns the task to app-oriented StaffAgents, each equipped with app-specialized expertise to address the diversity between apps. (3) Adjusted Evaluation conducts evaluation to provide reflection tips or deliver key information, which alleviates error propagation and information loss during multi-step execution. To continuously improve the performance of MobileSteward, we develop a Memory-based Self-evolution mechanism, which summarizes the experience from successful execution, to improve the performance of MobileSteward. We establish the first English Cross-APP Benchmark (CAPBench) in the real-world environment to evaluate the agents' capabilities of solving complex cross-app instructions. Experimental results demonstrate that MobileSteward achieves the best performance compared to both single-agent and multi-agent frameworks, highlighting the superiority of MobileSteward in better handling user instructions with diverse complexity.

Paper Structure

This paper contains 30 sections, 5 equations, 7 figures, 7 tables, 1 algorithm.

Figures (7)

  • Figure 1: Current mobile phone agents suffer from the following challenges when solving cross-app instructions: (1) Complex Task Relationships; (2) Diverse App Environment; (3) Error Propagation and Information Loss in Multi-Step Execution.
  • Figure 2: MobileSteward consists of a centralized StewardAgent and several app-oriented StaffAgents. The framework integrates three modules: (1) Dynamic Recruitment: StewardAgent splits the instruction into app-oriented tasks and generate StaffAgent scheduling graph.; (2) Assigned Execution: StaffAgent automates the assigned task and returns the execution history; (3) Adjusted Evaluation: StewardAgent provides reflection tips for wrong execution and summarizing successful executions to facilitate information transfer and adjust subsequent schedule.
  • Figure 3: App categories
  • Figure 4: Task Statistics
  • Figure 5: Complexity analysis on CAPBench.
  • ...and 2 more figures