Autonomous Continual Learning of Computer-Use Agents for Environment Adaptation
Tianci Xue, Zeyi Liao, Tianneng Shi, Zilu Wang, Kai Zhang, Dawn Song, Yu Su, Huan Sun
TL;DR
The paper addresses robustness of computer-use agents under diverse, evolving digital environments by introducing ACuRL, an autonomous curriculum reinforcement learning framework that learns without human data. It grounds task generation in observed environment context, employs iterative curriculum RL guided by capability evaluation, and uses CUAJudge for reliable long-horizon reward signals, all supported by an infrastructure for scalable, asynchronous training. Across six real-world environments, ACuRL yields 4–22% gains in target environments while mitigating catastrophic forgetting, aided by highly sparse parameter updates and distinct adaptation patterns between model components. These findings demonstrate a scalable path to robust, continual learning for desktop/web GUI agents in realistic, multi-environment settings.
Abstract
Real-world digital environments are highly diverse and dynamic. These characteristics cause agents to frequently encounter unseen scenarios and distribution shifts, making continual learning in specific environments essential for computer-use agents (CUAs). However, a key challenge lies in obtaining high-quality and environment-grounded agent data without relying on costly human annotation. In this work, we introduce ACuRL, an Autonomous Curriculum Reinforcement Learning framework that continually adapts agents to specific environments with zero human data. The agent first explores target environments to acquire initial experiences. During subsequent iterative training, a curriculum task generator leverages these experiences together with feedback from the previous iteration to synthesize new tasks tailored for the agent's current capabilities. To provide reliable reward signals, we introduce CUAJudge, a robust automatic evaluator for CUAs that achieves 93% agreement with human judgments. Empirically, our method effectively enables both intra-environment and cross-environment continual learning, yielding 4-22% performance gains without catastrophic forgetting on existing environments. Further analyses show highly sparse updates (e.g., 20% parameters), which helps explain the effective and robust adaptation. Our data and code are available at https://github.com/OSU-NLP-Group/ACuRL.
