Table of Contents
Fetching ...

OSGym: Super-Scalable Distributed Data Engine for Generalizable Computer Agents

Zengyi Qin, Jinyuan Chen, Yunze Man, Shengcao Cao, Ziqi Pang, Zhuoyuan Wang, Xin Sun, Gen Lin, Han Fang, Ling Zhu, Zixin Xie, Zibu Wei, Tianshu Ran, Haoran Geng, Xander Wu, Zachary Bright, Qizhen Sun, Rui Wang, Yuyang Cai, Song Wang, Jiace Zhao, Han Cao, Yeyang Zhou, Tianrui Liu, Ray Pan, Chongye Yang, Xiang Ren, Bo Zhang, Yutong Ban, Jitendra Malik, Pieter Abbeel

TL;DR

OSGym introduces a massively scalable, open-source data engine for training agents in full OS environments, addressing the generality, scalability, and cost barriers of OS-based agent research. It relies on a fully decentralized OS-state-management design, hardware-aware orchestration, a centralized data server, and algorithm-agnostic training interfaces to achieve thousands of parallel OS replicas at academia-friendly costs. Empirical results show near-linear scalability, robust self-recovery, and a practical end-to-end training pipeline that yields competitive results on OSWorld-like tasks with modest cloud spend. This framework lowers barriers to large-scale, realistic agent research and enables broad exploration of general computer-use capabilities across diverse tasks.

Abstract

We introduce OSGym, a super-scalable distributed data engine for training agents across diverse computer-related tasks. OSGym efficiently scales to over a thousand operating system (OS) replicas at an academia-affordable cost, serving as dynamic runtime environments for intelligent agents. It offers three key advantages. (1) Scalability: Despite the intensive resource requirements of running multiple OS replicas, OSGym parallelizes over a thousand instances while maintaining operational efficiency under constrained resources, generating up to 1420 multi-turn trajectories per minute. (2) Generality and Customizability: OSGym supports a broad spectrum of tasks that run on OS platforms, including tool use, browser interactions, software engineering, and office applications, with flexible support for diverse model training algorithms. (3) Economic Viability: OSGym operates at only 0.2-0.3 USD per day per OS replica using accessible on-demand compute providers. It is fully open-source and freely available for both research and commercial use. Experiments show that OSGym enables comprehensive data collection, supervised fine-tuning, and reinforcement learning pipelines for computer agents. Models trained with OSGym outperform state-of-the-art baselines, demonstrating its potential to advance scalability and universality in future agent research.

OSGym: Super-Scalable Distributed Data Engine for Generalizable Computer Agents

TL;DR

OSGym introduces a massively scalable, open-source data engine for training agents in full OS environments, addressing the generality, scalability, and cost barriers of OS-based agent research. It relies on a fully decentralized OS-state-management design, hardware-aware orchestration, a centralized data server, and algorithm-agnostic training interfaces to achieve thousands of parallel OS replicas at academia-friendly costs. Empirical results show near-linear scalability, robust self-recovery, and a practical end-to-end training pipeline that yields competitive results on OSWorld-like tasks with modest cloud spend. This framework lowers barriers to large-scale, realistic agent research and enables broad exploration of general computer-use capabilities across diverse tasks.

Abstract

We introduce OSGym, a super-scalable distributed data engine for training agents across diverse computer-related tasks. OSGym efficiently scales to over a thousand operating system (OS) replicas at an academia-affordable cost, serving as dynamic runtime environments for intelligent agents. It offers three key advantages. (1) Scalability: Despite the intensive resource requirements of running multiple OS replicas, OSGym parallelizes over a thousand instances while maintaining operational efficiency under constrained resources, generating up to 1420 multi-turn trajectories per minute. (2) Generality and Customizability: OSGym supports a broad spectrum of tasks that run on OS platforms, including tool use, browser interactions, software engineering, and office applications, with flexible support for diverse model training algorithms. (3) Economic Viability: OSGym operates at only 0.2-0.3 USD per day per OS replica using accessible on-demand compute providers. It is fully open-source and freely available for both research and commercial use. Experiments show that OSGym enables comprehensive data collection, supervised fine-tuning, and reinforcement learning pipelines for computer agents. Models trained with OSGym outperform state-of-the-art baselines, demonstrating its potential to advance scalability and universality in future agent research.

Paper Structure

This paper contains 12 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: OSGym Overview. OSGym decentralizes the OS replica running and state management to achieve high scalability, without sacrificing the average performance of each replica when scaling to a thousand replicas. It also has robust fault tolerance mechanism so that failures in some replicas do not affect the whole. OSGym also supports a wide variety of tasks as long as they run on an operating system, which is important for training general-purpose computer agents. OSGym also has a centralized data server with single-entry interface exposed to the user, which hides the underlying complexity and is easy to use. OSGym is also algorithm-independent, compatible with customized training and evaluation loops. Lastly, OSGym can be deployed on any cloud providers and costs as low as 0.2 to 0.3 USD / replica / day (or free for self-hosting), making it affordable for academia use.
  • Figure 2: Decentralized OS State Management. In centralized state management, a single manager manages all OS replicas. In semi-decentralized state management, OS replicas are split into groups where each group is controlled by a single manager. In decentralized state management, each OS replica has its own stage manager. The state manager has public methods similar to OpenAI Gym openaigym, with a special set of private methods to low-level manage the state and healthiness of OS replicas.
  • Figure 3: Hardware-Aware Optimization of OS Replica Orchestration. To cloud-deploy or self-host a large number of OS replicas, one may choose to host N replicas on N small servers, or on M large servers where each server hosts K = N / M replicas. We provide a useful insight that for small K, the scaling is CPU-bounded, while for large K, the scaling is RAM-bounded (see the bottom-left plot), and RAM is much cheaper than CPU. So we increase the RAM of each server to use a large K, which significantly cuts down the cost (see the bottom-right plot). The numbers following ¬± represents the standard deviation across 10 independent runs.
  • Figure 4: Diverse Tasks with Unified Flow. Since OSGym does not run specialized sandbox but runs full-fledged OS, it naturally supports a wide variety of tasks as long as the involved software run on the OS. OSGym also unifies the operation flow where each task has 4 parts, configure, reset, operate and evaluate, controlled by the public methods of the state manager.
  • Figure 5: Centralized Data Server with Easy-to-Use Single Entry. The data server is easy-to-use with single-entry batched methods. The complexities of state manager communication and data queuing is internally managed by the data server. The batched step method in the data server is designed to be asynchronous so that the training or evaluation loop is not blocked.
  • ...and 2 more figures