AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models

Chengxuan Lu; Shukuan Wang; Yanjie Li; Wei Liu; Shiji Jin; Fuyuan Qian; Peiming Li; Baigui Sun; Yang Liu

AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models

Chengxuan Lu, Shukuan Wang, Yanjie Li, Wei Liu, Shiji Jin, Fuyuan Qian, Peiming Li, Baigui Sun, Yang Liu

Abstract

Reinforcement learning (RL) for large-scale Vision-Language-Action (VLA) models faces significant challenges in computational efficiency and data acquisition. We propose AcceRL, a fully asynchronous and decoupled RL framework designed to eliminate synchronization barriers by physically isolating training, inference, and rollouts. Crucially, AcceRL is the first to integrate a plug-and-play, trainable world model into a distributed asynchronous RL pipeline to generate virtual experiences. Experiments on the LIBERO benchmark demonstrate that AcceRL achieves state-of-the-art (SOTA) performance. Systematically, it exhibits super-linear scaling in throughput and highly efficient hardware utilization. Algorithmically, the world-model-augmented variant delivers unprecedented sample efficiency and robust training stability in complex control tasks.

AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models

Abstract

Paper Structure (29 sections, 16 equations, 8 figures, 2 tables)

This paper contains 29 sections, 16 equations, 8 figures, 2 tables.

Introduction
Related work
Evolution of Distributed Reinforcement Learning Frameworks
Training and Optimization of Vision-Language-Action Models
World Models and Model-Based Reinforcement Learning
AcceRL: A Fully Asynchronous Training Framework
Macro-Asynchrony: Decoupling Training and Rollout
Micro-Asynchrony: Decoupling Interaction and Inference
Addressing Policy Lag
Value Re-computation
Global Advantage Normalization
Gradient Calibration
Dynamic Weighted Resampling
Asynchronous Parallel Data Prefetching
AcceRL with World Model
...and 14 more sections

Figures (8)

Figure 1: Overview of AcceRL. (Top Left) The framework's throughput exhibits super-linear scaling with the number of trainer GPUs, enabled by ZeRO rajbhandari2020zero optimizations. (Bottom Left) Experimental results on the LIBERO benchmark demonstrate that AcceRL achieves SOTA performance across all evaluation categories. (Middle) A conceptual illustration of the world model in reinforcement learning, representing the agent "learning in imagination" to significantly boost data utilization. (Right) Performance comparison between model-based (AcceRL with WM) and model-free (AcceRL) approaches. By leveraging a world model pre-trained on 1,000 offline trajectories, the model-based architecture improves online sample efficiency by $200\times$.
Figure 2: Timeline of a synchronous RL framework (left) and the asynchronous framework AcceRL (right). The inference forward blocks in AcceRL encompass both the batching wait time and the inference time. By eliminating the GPU bubbles, AcceRL maximizes hardware utilization and significantly enhances overall training efficiency.
Figure 3: the base backbone
Figure 4: the backbone augmented with world model
Figure 6: Scalability performance of the AcceRL framework with decoupled data labels.(a) Throughput scaling demonstrates near-linear performance up to 64 rollout workers. (b) Trainer scalability shows that the actual trainer SPS closely tracks the ideal marginal scaling curve when scaling up to 7 GPUs. Both configurations effectively maintain high hardware utilization without suffering from severe communication overhead.
...and 3 more figures

AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models

Abstract

AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models

Authors

Abstract

Table of Contents

Figures (8)