Table of Contents
Fetching ...

TwinRL-VLA: Digital Twin-Driven Reinforcement Learning for Real-World Robotic Manipulation

Qinwen Xu, Jiaming Liu, Rui Zhou, Shaojun Shi, Nuowei Han, Zhuoyang Liu, Chenyang Gu, Shuo Gu, Yang Yue, Gao Huang, Wenzhao Zheng, Sirui Han, Peng Jia, Shanghang Zhang

TL;DR

This work tackles the data efficiency challenges of applying online RL to real-world Vision–Language–Action robotic manipulation. It introduces TwinRL, a digital twin–real-world collaborative framework that expands the exploration space using a high-fidelity digital twin reconstructed from casual video and then runs parallel online RL in the twin to guide real-world learning. Key contributions include an exploration-space expansion strategy, a sim-to-real guided exploration pipeline that seeds real-world replay buffers, and targeted HiL rollouts informed by the twin to accelerate convergence. Across four manipulation tasks, TwinRL achieves near-100% success in both in-distribution and out-of-distribution regions with about 20 minutes of real-world interaction, delivering substantial speedups over prior methods and demonstrating practical impact for deploying RL-augmented Vision–Language–Action policies on physical robots.

Abstract

Despite strong generalization capabilities, Vision-Language-Action (VLA) models remain constrained by the high cost of expert demonstrations and insufficient real-world interaction. While online reinforcement learning (RL) has shown promise in improving general foundation models, applying RL to VLA manipulation in real-world settings is still hindered by low exploration efficiency and a restricted exploration space. Through systematic real-world experiments, we observe that the effective exploration space of online RL is closely tied to the data distribution of supervised fine-tuning (SFT). Motivated by this observation, we propose TwinRL, a digital twin-real-world collaborative RL framework designed to scale and guide exploration for VLA models. First, a high-fidelity digital twin is efficiently reconstructed from smartphone-captured scenes, enabling realistic bidirectional transfer between real and simulated environments. During the SFT warm-up stage, we introduce an exploration space expansion strategy using digital twins to broaden the support of the data trajectory distribution. Building on this enhanced initialization, we propose a sim-to-real guided exploration strategy to further accelerate online RL. Specifically, TwinRL performs efficient and parallel online RL in the digital twin prior to deployment, effectively bridging the gap between offline and online training stages. Subsequently, we exploit efficient digital twin sampling to identify failure-prone yet informative configurations, which are used to guide targeted human-in-the-loop rollouts on the real robot. In our experiments, TwinRL approaches 100% success in both in-distribution regions covered by real-world demonstrations and out-of-distribution regions, delivering at least a 30% speedup over prior real-world RL methods and requiring only about 20 minutes on average across four tasks.

TwinRL-VLA: Digital Twin-Driven Reinforcement Learning for Real-World Robotic Manipulation

TL;DR

This work tackles the data efficiency challenges of applying online RL to real-world Vision–Language–Action robotic manipulation. It introduces TwinRL, a digital twin–real-world collaborative framework that expands the exploration space using a high-fidelity digital twin reconstructed from casual video and then runs parallel online RL in the twin to guide real-world learning. Key contributions include an exploration-space expansion strategy, a sim-to-real guided exploration pipeline that seeds real-world replay buffers, and targeted HiL rollouts informed by the twin to accelerate convergence. Across four manipulation tasks, TwinRL achieves near-100% success in both in-distribution and out-of-distribution regions with about 20 minutes of real-world interaction, delivering substantial speedups over prior methods and demonstrating practical impact for deploying RL-augmented Vision–Language–Action policies on physical robots.

Abstract

Despite strong generalization capabilities, Vision-Language-Action (VLA) models remain constrained by the high cost of expert demonstrations and insufficient real-world interaction. While online reinforcement learning (RL) has shown promise in improving general foundation models, applying RL to VLA manipulation in real-world settings is still hindered by low exploration efficiency and a restricted exploration space. Through systematic real-world experiments, we observe that the effective exploration space of online RL is closely tied to the data distribution of supervised fine-tuning (SFT). Motivated by this observation, we propose TwinRL, a digital twin-real-world collaborative RL framework designed to scale and guide exploration for VLA models. First, a high-fidelity digital twin is efficiently reconstructed from smartphone-captured scenes, enabling realistic bidirectional transfer between real and simulated environments. During the SFT warm-up stage, we introduce an exploration space expansion strategy using digital twins to broaden the support of the data trajectory distribution. Building on this enhanced initialization, we propose a sim-to-real guided exploration strategy to further accelerate online RL. Specifically, TwinRL performs efficient and parallel online RL in the digital twin prior to deployment, effectively bridging the gap between offline and online training stages. Subsequently, we exploit efficient digital twin sampling to identify failure-prone yet informative configurations, which are used to guide targeted human-in-the-loop rollouts on the real robot. In our experiments, TwinRL approaches 100% success in both in-distribution regions covered by real-world demonstrations and out-of-distribution regions, delivering at least a 30% speedup over prior real-world RL methods and requiring only about 20 minutes on average across four tasks.
Paper Structure (16 sections, 8 equations, 16 figures, 2 tables)

This paper contains 16 sections, 8 equations, 16 figures, 2 tables.

Figures (16)

  • Figure 1: Overview. (a) We propose TwinRL, a digital twin–real-world collaborative RL framework that expands the exploration space from in-distribution teleoperation data to out-of-distribution regions. TwinRL then performs efficient, parallel online RL in the digital twin to enable sim-to-real guided exploration, improving the convergence speed of real-world RL. b) Across four tasks, TwinRL converges faster in online RL and approaches a 100% success rate, reaching this level in about 20 minutes on average in both in-distribution regions covered by real-world demonstrations and out-of-distribution regions. Since HiL-SERL does not include an SFT, its accuracy are reported only for the in-distribution region used by the other methods.
  • Figure 2: Exploration Bottlenecks.(a) We split the workspace into an in-distribution region (A) and an OOD region (B). Each region is defined by the manipulated object’s center location at task completion. (b) Heatmaps visualize the performance of different policies. (c) Learning curves show the online RL training dynamics of the A-only policy in both regions.
  • Figure 3: TwinRL. Stage I: Starting from human teleoperation, we introduce an exploration-space expansion strategy that synthesizes diverse digital-twin demonstrations to broaden SFT coverage. Stage II: The SFT-initialized policy is then trained with scalable, parallel online RL in the digital twin to harvest RL-style rollouts, which are transferred to initialize the real-world replay buffer and stabilize online learning. Stage III: During real-world online RL, the digital twin efficiently and continuously identifies failure-prone yet informative object configurations, which are then used to guide targeted HiL rollouts.
  • Figure 4: Real-world experimental setup. We consider four tasks, namely Pick-and-Place, Insert-Hexagon-Block, Insert-Triple-Column-Block, and Erase-Whiteboard, covering multi-step, precise, and contact-rich manipulation. The red and blue areas denote the in-distribution (ID) and out-of-distribution (OOD) evaluation regions, respectively.
  • Figure 5: Real-world Experiments. We report success-rate curves for online RL across four manipulation tasks under both ID and OOD settings. The y-axis shows success rate, and the x-axis reports both online training time and our model training steps.
  • ...and 11 more figures