Table of Contents
Fetching ...

Prompt2Task: Automating UI Tasks on Smartphones from Textual Prompts

Tian Huang, Chun Yu, Weinan Shi, Zijian Peng, David Yang, Weiqi Sun, Yuanchun Shi

TL;DR

Prompt2Task tackles the barrier to wide adoption of UI task automation by converting unrestricted textual prompts into smartphone UI actions through a three-stage pipeline and a cooperative multi-agent system. It combines information collection, instruction generation, and operation mapping with a rich, accumulated knowledge base (Historical Task Repository, Context Library, Instruction Set, Mobile Interaction Graph) and a human-in-the-loop to minimize intervention while maximizing reliability. The system demonstrates strong gains over baselines, achieving a rise from $22.28\%$ to $95.24\%$ task success on 2,500 prompts and enabling low-intervention automation (≈$0.69$ interventions per new task), with user studies confirming usability and effectiveness across skilled and unskilled users. Open-ended knowledge and continuous learning enable Prompt2Task to scale across apps and tasks, with practical impact in tutorial generation, smart assistance, and customer support while outlining clear directions for handling GUI dynamics and reducing latency in future work.

Abstract

UI task automation enables efficient task execution by simulating human interactions with graphical user interfaces (GUIs), without modifying the existing application code. However, its broader adoption is constrained by the need for expertise in both scripting languages and workflow design. To address this challenge, we present Prompt2Task, a system designed to comprehend various task-related textual prompts (e.g., goals, procedures), thereby generating and performing the corresponding automation tasks. Prompt2Task incorporates a suite of intelligent agents that mimic human cognitive functions, specializing in interpreting user intent, managing external information for task generation, and executing operations on smartphones. The agents can learn from user feedback and continuously improve their performance based on the accumulated knowledge. Experimental results indicated a performance jump from a 22.28\% success rate in the baseline to 95.24\% with Prompt2Task, requiring an average of 0.69 user interventions for each new task. Prompt2Task presents promising applications in fields such as tutorial creation, smart assistance, and customer service.

Prompt2Task: Automating UI Tasks on Smartphones from Textual Prompts

TL;DR

Prompt2Task tackles the barrier to wide adoption of UI task automation by converting unrestricted textual prompts into smartphone UI actions through a three-stage pipeline and a cooperative multi-agent system. It combines information collection, instruction generation, and operation mapping with a rich, accumulated knowledge base (Historical Task Repository, Context Library, Instruction Set, Mobile Interaction Graph) and a human-in-the-loop to minimize intervention while maximizing reliability. The system demonstrates strong gains over baselines, achieving a rise from to task success on 2,500 prompts and enabling low-intervention automation (≈ interventions per new task), with user studies confirming usability and effectiveness across skilled and unskilled users. Open-ended knowledge and continuous learning enable Prompt2Task to scale across apps and tasks, with practical impact in tutorial generation, smart assistance, and customer support while outlining clear directions for handling GUI dynamics and reducing latency in future work.

Abstract

UI task automation enables efficient task execution by simulating human interactions with graphical user interfaces (GUIs), without modifying the existing application code. However, its broader adoption is constrained by the need for expertise in both scripting languages and workflow design. To address this challenge, we present Prompt2Task, a system designed to comprehend various task-related textual prompts (e.g., goals, procedures), thereby generating and performing the corresponding automation tasks. Prompt2Task incorporates a suite of intelligent agents that mimic human cognitive functions, specializing in interpreting user intent, managing external information for task generation, and executing operations on smartphones. The agents can learn from user feedback and continuously improve their performance based on the accumulated knowledge. Experimental results indicated a performance jump from a 22.28\% success rate in the baseline to 95.24\% with Prompt2Task, requiring an average of 0.69 user interventions for each new task. Prompt2Task presents promising applications in fields such as tutorial creation, smart assistance, and customer service.
Paper Structure (60 sections, 1 equation, 27 figures, 15 tables)

This paper contains 60 sections, 1 equation, 27 figures, 15 tables.

Figures (27)

  • Figure 1: The workflow of Prompt2Task. In the first stage, users input task-related textual prompts, and the agent collects relevant textual information to plan the task. In the second stage, the agent converts the text into formally defined instructions. In the third stage, the agent maps these instructions to operations, ultimately executing the automation task on the smartphone. Users can intervene in the agents' decisions, especially when help is requested. Blue arrows guide the process from prompts to task automation, while green arrows indicate the knowledge accumulation through user interactions, thereby enhancing the future performance of the agents.
  • Figure 2: Components of a Prompt2Task model.
  • Figure 3: For the same textual prompt "post a new moment", however, the page state can affect the specific operations required.
  • Figure 4: System design of Prompt2Task. The blue solid lines represent the main flow of the task automation process, the grey dashed lines represent optional user intervention, the purple lines represent knowledge accumulation after task completion and its optimization to each stage, and the red solid line denotes instances where the agent seeks user assistance. Each stage's workflow details are shown in Figure \ref{['fig:informationcollection']}, \ref{['fig:instructiongeneration']}, \ref{['fig:operationmapping']}.
  • Figure 5: Information collection workflow. Task-related functions and step descriptions are collected by the Analysis Agent and Retrieval Agent. The colors of the components within the prompt indicate their sources, which are the same color elements pointing to the prompt with the same color arrows. Gray arrows represent the normal task flow, while purple arrows indicate the accumulation of knowledge.
  • ...and 22 more figures