Table of Contents
Fetching ...

RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version)

Yao Mu, Tianxing Chen, Shijia Peng, Zanxin Chen, Zeyu Gao, Yude Zou, Lunkai Lin, Zhiqiang Xie, Ping Luo

TL;DR

RoboTwin tackles the data scarcity problem in dual-arm robotic manipulation by coupling real-world teleoperation data with AI-generated digital twins created from a single 2D image. The framework uses generative models to construct high-fidelity 3D object representations, defines functional axes for grasping, and employs LLMs to generate expert data and task-specific scripts that drive trajectory planning. An open, real-to-sim benchmark is provided, including both simulated and real-world data, with an API to generate diverse expert demonstrations and an offline dataset for benchmarking. Experimental results with the 3D Diffusion Policy show that policies trained on RoboTwin-generated data and fine-tuned with limited real data achieve substantial gains over baselines, illustrating improved alignment between simulated training and real-world performance. Overall, RoboTwin offers a scalable pathway to robust dual-arm manipulation benchmarks and data-driven methods that generalize to real-world settings.

Abstract

In the rapidly advancing field of robotics, dual-arm coordination and complex object manipulation are essential capabilities for developing advanced autonomous systems. However, the scarcity of diverse, high-quality demonstration data and real-world-aligned evaluation benchmarks severely limits such development. To address this, we introduce RoboTwin, a generative digital twin framework that uses 3D generative foundation models and large language models to produce diverse expert datasets and provide a real-world-aligned evaluation platform for dual-arm robotic tasks. Specifically, RoboTwin creates varied digital twins of objects from single 2D images, generating realistic and interactive scenarios. It also introduces a spatial relation-aware code generation framework that combines object annotations with large language models to break down tasks, determine spatial constraints, and generate precise robotic movement code. Our framework offers a comprehensive benchmark with both simulated and real-world data, enabling standardized evaluation and better alignment between simulated training and real-world performance. We validated our approach using the open-source COBOT Magic Robot platform. Policies pre-trained on RoboTwin-generated data and fine-tuned with limited real-world samples improve the success rate of over 70% for single-arm tasks and over 40% for dual-arm tasks compared to models trained solely on real-world data. This significant improvement demonstrates RoboTwin's potential to enhance the development and evaluation of dual-arm robotic manipulation systems. Project Page: https://robotwin-benchmark.github.io/early-version/.

RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version)

TL;DR

RoboTwin tackles the data scarcity problem in dual-arm robotic manipulation by coupling real-world teleoperation data with AI-generated digital twins created from a single 2D image. The framework uses generative models to construct high-fidelity 3D object representations, defines functional axes for grasping, and employs LLMs to generate expert data and task-specific scripts that drive trajectory planning. An open, real-to-sim benchmark is provided, including both simulated and real-world data, with an API to generate diverse expert demonstrations and an offline dataset for benchmarking. Experimental results with the 3D Diffusion Policy show that policies trained on RoboTwin-generated data and fine-tuned with limited real data achieve substantial gains over baselines, illustrating improved alignment between simulated training and real-world performance. Overall, RoboTwin offers a scalable pathway to robust dual-arm manipulation benchmarks and data-driven methods that generalize to real-world settings.

Abstract

In the rapidly advancing field of robotics, dual-arm coordination and complex object manipulation are essential capabilities for developing advanced autonomous systems. However, the scarcity of diverse, high-quality demonstration data and real-world-aligned evaluation benchmarks severely limits such development. To address this, we introduce RoboTwin, a generative digital twin framework that uses 3D generative foundation models and large language models to produce diverse expert datasets and provide a real-world-aligned evaluation platform for dual-arm robotic tasks. Specifically, RoboTwin creates varied digital twins of objects from single 2D images, generating realistic and interactive scenarios. It also introduces a spatial relation-aware code generation framework that combines object annotations with large language models to break down tasks, determine spatial constraints, and generate precise robotic movement code. Our framework offers a comprehensive benchmark with both simulated and real-world data, enabling standardized evaluation and better alignment between simulated training and real-world performance. We validated our approach using the open-source COBOT Magic Robot platform. Policies pre-trained on RoboTwin-generated data and fine-tuned with limited real-world samples improve the success rate of over 70% for single-arm tasks and over 40% for dual-arm tasks compared to models trained solely on real-world data. This significant improvement demonstrates RoboTwin's potential to enhance the development and evaluation of dual-arm robotic manipulation systems. Project Page: https://robotwin-benchmark.github.io/early-version/.
Paper Structure (11 sections, 5 figures, 1 table)

This paper contains 11 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: RoboTwin Benchmark.
  • Figure 2: AIGC & Expert Data Generation pipeline. Automatic extraction of object segmentation and textual description from a single RGB photo, followed by the generation of 3D geometry, surface normals, Wireframe, and texture maps to create a high-fidelity simulation object. With the object's surface normal and pose information, we can decompose and generate grasping postures, and leverage the large model's capabilities to zero-shot generate expert data for tasks.
  • Figure 3: Point for Function and Contact, Axis pointing to the functional part and approach direction
  • Figure 4: Task Execution of RoboTwin Benchmark.
  • Figure 5: Illustration of our robot platform, with the capabilities for teleoperation, mobility, and data acquisition.