GPTVoiceTasker: Advancing Multi-step Mobile Task Efficiency Through Dynamic Interface Exploration and Learning
Minh Duc Vu, Han Wang, Zhuang Li, Jieshan Chen, Shengdong Zhao, Zhenchang Xing, Chunyang Chen
TL;DR
GPTVoiceTasker tackles the inefficiency and misinterpretation barrier of mobile voice assistants by combining LLM-driven command understanding with a dynamic, history-informed on-device execution framework. It blends unprecedented task exploration (collecting UI context, anonymising data, and prompting the LLM in a two-step process) with precedented task automation via a transition graph and semantic screen descriptions, enabling both novel and recurring tasks to be completed through voice. Key contributions include a hierarchical UI knowledge collection, privacy-preserving prompt design with Few-shot and Chain-of-Thought prompts, a shortest-path navigation engine, and a human-in-the-loop for continual refinement, all implemented on Android with GPT-4 and open-sourced. Empirical results show strong command parsing ($EM$ ≈ 84–85%), high multi-step task success (≈85.7%), and real-user studies reporting a ~34.85% gain in task efficiency and favorable usability, highlighting practical impact for accessibility and everyday task automation on mobile devices.
Abstract
Virtual assistants have the potential to play an important role in helping users achieves different tasks. However, these systems face challenges in their real-world usability, characterized by inefficiency and struggles in grasping user intentions. Leveraging recent advances in Large Language Models (LLMs), we introduce GptVoiceTasker, a virtual assistant poised to enhance user experiences and task efficiency on mobile devices. GptVoiceTasker excels at intelligently deciphering user commands and executing relevant device interactions to streamline task completion. The system continually learns from historical user commands to automate subsequent usages, further enhancing execution efficiency. Our experiments affirm GptVoiceTasker's exceptional command interpretation abilities and the precision of its task automation module. In our user study, GptVoiceTasker boosted task efficiency in real-world scenarios by 34.85%, accompanied by positive participant feedback. We made GptVoiceTasker open-source, inviting further research into LLMs utilization for diverse tasks through prompt engineering and leveraging user usage data to improve efficiency.
