Table of Contents
Fetching ...

AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models

Chengxuan Lu, Shukuan Wang, Yanjie Li, Wei Liu, Shiji Jin, Fuyuan Qian, Peiming Li, Baigui Sun, Yang Liu

Abstract

Reinforcement learning (RL) for large-scale Vision-Language-Action (VLA) models faces significant challenges in computational efficiency and data acquisition. We propose AcceRL, a fully asynchronous and decoupled RL framework designed to eliminate synchronization barriers by physically isolating training, inference, and rollouts. Crucially, AcceRL is the first to integrate a plug-and-play, trainable world model into a distributed asynchronous RL pipeline to generate virtual experiences. Experiments on the LIBERO benchmark demonstrate that AcceRL achieves state-of-the-art (SOTA) performance. Systematically, it exhibits super-linear scaling in throughput and highly efficient hardware utilization. Algorithmically, the world-model-augmented variant delivers unprecedented sample efficiency and robust training stability in complex control tasks.

AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models

Abstract

Reinforcement learning (RL) for large-scale Vision-Language-Action (VLA) models faces significant challenges in computational efficiency and data acquisition. We propose AcceRL, a fully asynchronous and decoupled RL framework designed to eliminate synchronization barriers by physically isolating training, inference, and rollouts. Crucially, AcceRL is the first to integrate a plug-and-play, trainable world model into a distributed asynchronous RL pipeline to generate virtual experiences. Experiments on the LIBERO benchmark demonstrate that AcceRL achieves state-of-the-art (SOTA) performance. Systematically, it exhibits super-linear scaling in throughput and highly efficient hardware utilization. Algorithmically, the world-model-augmented variant delivers unprecedented sample efficiency and robust training stability in complex control tasks.
Paper Structure (29 sections, 16 equations, 8 figures, 2 tables)

This paper contains 29 sections, 16 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Overview of AcceRL. (Top Left) The framework's throughput exhibits super-linear scaling with the number of trainer GPUs, enabled by ZeRO rajbhandari2020zero optimizations. (Bottom Left) Experimental results on the LIBERO benchmark demonstrate that AcceRL achieves SOTA performance across all evaluation categories. (Middle) A conceptual illustration of the world model in reinforcement learning, representing the agent "learning in imagination" to significantly boost data utilization. (Right) Performance comparison between model-based (AcceRL with WM) and model-free (AcceRL) approaches. By leveraging a world model pre-trained on 1,000 offline trajectories, the model-based architecture improves online sample efficiency by $200\times$.
  • Figure 2: Timeline of a synchronous RL framework (left) and the asynchronous framework AcceRL (right). The inference forward blocks in AcceRL encompass both the batching wait time and the inference time. By eliminating the GPU bubbles, AcceRL maximizes hardware utilization and significantly enhances overall training efficiency.
  • Figure 3: the base backbone
  • Figure 4: the backbone augmented with world model
  • Figure 6: Scalability performance of the AcceRL framework with decoupled data labels.(a) Throughput scaling demonstrates near-linear performance up to 64 rollout workers. (b) Trainer scalability shows that the actual trainer SPS closely tracks the ideal marginal scaling curve when scaling up to 7 GPUs. Both configurations effectively maintain high hardware utilization without suffering from severe communication overhead.
  • ...and 3 more figures