Table of Contents
Fetching ...

Autonomous Continual Learning of Computer-Use Agents for Environment Adaptation

Tianci Xue, Zeyi Liao, Tianneng Shi, Zilu Wang, Kai Zhang, Dawn Song, Yu Su, Huan Sun

TL;DR

The paper addresses robustness of computer-use agents under diverse, evolving digital environments by introducing ACuRL, an autonomous curriculum reinforcement learning framework that learns without human data. It grounds task generation in observed environment context, employs iterative curriculum RL guided by capability evaluation, and uses CUAJudge for reliable long-horizon reward signals, all supported by an infrastructure for scalable, asynchronous training. Across six real-world environments, ACuRL yields 4–22% gains in target environments while mitigating catastrophic forgetting, aided by highly sparse parameter updates and distinct adaptation patterns between model components. These findings demonstrate a scalable path to robust, continual learning for desktop/web GUI agents in realistic, multi-environment settings.

Abstract

Real-world digital environments are highly diverse and dynamic. These characteristics cause agents to frequently encounter unseen scenarios and distribution shifts, making continual learning in specific environments essential for computer-use agents (CUAs). However, a key challenge lies in obtaining high-quality and environment-grounded agent data without relying on costly human annotation. In this work, we introduce ACuRL, an Autonomous Curriculum Reinforcement Learning framework that continually adapts agents to specific environments with zero human data. The agent first explores target environments to acquire initial experiences. During subsequent iterative training, a curriculum task generator leverages these experiences together with feedback from the previous iteration to synthesize new tasks tailored for the agent's current capabilities. To provide reliable reward signals, we introduce CUAJudge, a robust automatic evaluator for CUAs that achieves 93% agreement with human judgments. Empirically, our method effectively enables both intra-environment and cross-environment continual learning, yielding 4-22% performance gains without catastrophic forgetting on existing environments. Further analyses show highly sparse updates (e.g., 20% parameters), which helps explain the effective and robust adaptation. Our data and code are available at https://github.com/OSU-NLP-Group/ACuRL.

Autonomous Continual Learning of Computer-Use Agents for Environment Adaptation

TL;DR

The paper addresses robustness of computer-use agents under diverse, evolving digital environments by introducing ACuRL, an autonomous curriculum reinforcement learning framework that learns without human data. It grounds task generation in observed environment context, employs iterative curriculum RL guided by capability evaluation, and uses CUAJudge for reliable long-horizon reward signals, all supported by an infrastructure for scalable, asynchronous training. Across six real-world environments, ACuRL yields 4–22% gains in target environments while mitigating catastrophic forgetting, aided by highly sparse parameter updates and distinct adaptation patterns between model components. These findings demonstrate a scalable path to robust, continual learning for desktop/web GUI agents in realistic, multi-environment settings.

Abstract

Real-world digital environments are highly diverse and dynamic. These characteristics cause agents to frequently encounter unseen scenarios and distribution shifts, making continual learning in specific environments essential for computer-use agents (CUAs). However, a key challenge lies in obtaining high-quality and environment-grounded agent data without relying on costly human annotation. In this work, we introduce ACuRL, an Autonomous Curriculum Reinforcement Learning framework that continually adapts agents to specific environments with zero human data. The agent first explores target environments to acquire initial experiences. During subsequent iterative training, a curriculum task generator leverages these experiences together with feedback from the previous iteration to synthesize new tasks tailored for the agent's current capabilities. To provide reliable reward signals, we introduce CUAJudge, a robust automatic evaluator for CUAs that achieves 93% agreement with human judgments. Empirically, our method effectively enables both intra-environment and cross-environment continual learning, yielding 4-22% performance gains without catastrophic forgetting on existing environments. Further analyses show highly sparse updates (e.g., 20% parameters), which helps explain the effective and robust adaptation. Our data and code are available at https://github.com/OSU-NLP-Group/ACuRL.
Paper Structure (32 sections, 5 equations, 6 figures, 7 tables)

This paper contains 32 sections, 5 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: The overview of our ACuRL framework. The agent first autonomously interacts with the environment to collect initial environment experience, and then undergoes iterative RL training to continually learn and adapt to target environments through curriculum tasks with difficulty levels tailored to the agent's current capabilities based on feedback from CUAJudge.
  • Figure 2: Sparsity of parameter updates across different iterations during iterative RL training, averaged over six environments.
  • Figure 3: Average number of words across three iterations for different environments.
  • Figure 4: Overlap ratio of significantly updated parameters between LibreOffice Impress and other environments across different layers.
  • Figure 5: Examples of LibreOffice Impress context.
  • ...and 1 more figures