UINav: A Practical Approach to Train On-Device Automation Agents

Wei Li; Fu-Lin Hsu; Will Bishop; Folawiyo Campbell-Ajala; Max Lin; Oriana Riva

UINav: A Practical Approach to Train On-Device Automation Agents

Wei Li, Fu-Lin Hsu, Will Bishop, Folawiyo Campbell-Ajala, Max Lin, Oriana Riva

TL;DR

UINav presents a practical, demonstration-driven pipeline for training lightweight on-device UI automation agents. By coupling an agent with a referee model, employing macro actions, and applying demonstration augmentation and utterance masking, it achieves high task success with relatively few demonstrations and runs efficiently on mobile hardware. The approach demonstrates strong generalization across tasks and apps on MoTIF, and shows that a single multi-task agent can learn across diverse tasks with transfer learning benefits. This work offers a viable path toward accessible, low-cost automated UI interaction on mainstream devices, with important considerations for privacy and misuse.

Abstract

Automation systems that can autonomously drive application user interfaces to complete user tasks are of great benefit, especially when users are situationally or permanently impaired. Prior automation systems do not produce generalizable models while AI-based automation agents work reliably only in simple, hand-crafted applications or incur high computation costs. We propose UINav, a demonstration-based approach to train automation agents that fit mobile devices, yet achieving high success rates with modest numbers of demonstrations. To reduce the demonstration overhead, UINav uses a referee model that provides users with immediate feedback on tasks where the agent fails, and automatically augments human demonstrations to increase diversity in training data. Our evaluation shows that with only 10 demonstrations UINav can achieve 70% accuracy, and that with enough demonstrations it can surpass 90% accuracy.

UINav: A Practical Approach to Train On-Device Automation Agents

TL;DR

Abstract

Paper Structure (33 sections, 12 figures, 3 tables)

This paper contains 33 sections, 12 figures, 3 tables.

Introduction
Related work
UI automation scripts.
AI-based automation.
Why is UI automation hard?
System design
Agent's neural network architecture
Referee model
Utterance masking
Increasing robustness and efficiency
Action validation and macro actions.
Demonstration augmentation.
System evaluation
Agent and referee accuracy
Demonstration effort
...and 18 more sections

Figures (12)

Figure 1: High-level architecture of UINav.
Figure 2: The neural network of the agent model.
Figure 3: Referee model compared to the MoTIF system motif using the MoTIF dataset.
Figure 4: Number of demonstrations in the training set collected for 43 tasks across 128 apps/websites.
Figure 5: Comparison between multi- and single-task agents with an increasing number of demonstrations.
...and 7 more figures

UINav: A Practical Approach to Train On-Device Automation Agents

TL;DR

Abstract

UINav: A Practical Approach to Train On-Device Automation Agents

Authors

TL;DR

Abstract

Table of Contents

Figures (12)