How Smart Is Your GUI Agent? A Framework for the Future of Software Interaction
Sidong Feng, Chunyang Chen
TL;DR
The paper tackles the lack of a common vocabulary for GUI agent autonomy and proposes a structured framework called GUI Agent Autonomy Levels (GAL). It defines six levels from no automation to full automation to benchmark capabilities across GUI environments. The authors survey current prototypes, benchmarks, and industry deployments, and discuss challenges related to perception, generalization, state tracking, security, and privacy. They argue that GAL provides a clear evaluation scaffold and roadmap toward progressively more autonomous and trustworthy GUI agents while acknowledging that Level 5 remains a distant goal.
Abstract
GUI agents are rapidly becoming a new interaction to software, allowing people to navigate web, desktop and mobile rather than execute them click by click. Yet ``agent'' is described with radically different degrees of autonomy, obscuring capability, responsibility and risk. We call for conceptual clarity through GUI Agent Autonomy Levels (GAL), a six-level framework that makes autonomy explicit and helps benchmark progress toward trustworthy software interaction.
