Table of Contents
Fetching ...

DexFlyWheel: A Scalable and Self-improving Data Generation Framework for Dexterous Manipulation

Kefei Zhu, Fengshuo Bai, YuanHao Xiang, Yishuai Cai, Xinglin Chen, Ruochong Li, Xingtao Wang, Hao Dong, Yaodong Yang, Xiaopeng Fan, Yuanpei Chen

TL;DR

DexFlyWheel tackles the data bottleneck in dexterous manipulation by introducing a scalable, self‑improving data generation framework. It combines imitation learning to capture human-like behavior with residual reinforcement learning to generalize to new objects, gated by a data augmentation–driven flywheel that expands object, environment, and pose diversity across iterations. Empirical results show substantial data growth (thousands of demonstrations across tasks), high policy performance (81.9% SR on challenging test sets), and successful sim-to-real transfer via a digital twin to dual-arm robots. The approach offers a practical, data-efficient conduit to scalable dexterous manipulation policies.

Abstract

Dexterous manipulation is critical for advancing robot capabilities in real-world applications, yet diverse and high-quality datasets remain scarce. Existing data collection methods either rely on human teleoperation or require significant human engineering, or generate data with limited diversity, which restricts their scalability and generalization. In this paper, we introduce DexFlyWheel, a scalable data generation framework that employs a self-improving cycle to continuously enrich data diversity. Starting from efficient seed demonstrations warmup, DexFlyWheel expands the dataset through iterative cycles. Each cycle follows a closed-loop pipeline that integrates Imitation Learning (IL), residual Reinforcement Learning (RL), rollout trajectory collection, and data augmentation. Specifically, IL extracts human-like behaviors from demonstrations, and residual RL enhances policy generalization. The learned policy is then used to generate trajectories in simulation, which are further augmented across diverse environments and spatial configurations before being fed back into the next cycle. Over successive iterations, a self-improving data flywheel effect emerges, producing datasets that cover diverse scenarios and thereby scaling policy performance. Experimental results demonstrate that DexFlyWheel generates over 2,000 diverse demonstrations across four challenging tasks. Policies trained on our dataset achieve an average success rate of 81.9\% on the challenge test sets and successfully transfer to the real world through digital twin, achieving a 78.3\% success rate on dual-arm lift tasks.

DexFlyWheel: A Scalable and Self-improving Data Generation Framework for Dexterous Manipulation

TL;DR

DexFlyWheel tackles the data bottleneck in dexterous manipulation by introducing a scalable, self‑improving data generation framework. It combines imitation learning to capture human-like behavior with residual reinforcement learning to generalize to new objects, gated by a data augmentation–driven flywheel that expands object, environment, and pose diversity across iterations. Empirical results show substantial data growth (thousands of demonstrations across tasks), high policy performance (81.9% SR on challenging test sets), and successful sim-to-real transfer via a digital twin to dual-arm robots. The approach offers a practical, data-efficient conduit to scalable dexterous manipulation policies.

Abstract

Dexterous manipulation is critical for advancing robot capabilities in real-world applications, yet diverse and high-quality datasets remain scarce. Existing data collection methods either rely on human teleoperation or require significant human engineering, or generate data with limited diversity, which restricts their scalability and generalization. In this paper, we introduce DexFlyWheel, a scalable data generation framework that employs a self-improving cycle to continuously enrich data diversity. Starting from efficient seed demonstrations warmup, DexFlyWheel expands the dataset through iterative cycles. Each cycle follows a closed-loop pipeline that integrates Imitation Learning (IL), residual Reinforcement Learning (RL), rollout trajectory collection, and data augmentation. Specifically, IL extracts human-like behaviors from demonstrations, and residual RL enhances policy generalization. The learned policy is then used to generate trajectories in simulation, which are further augmented across diverse environments and spatial configurations before being fed back into the next cycle. Over successive iterations, a self-improving data flywheel effect emerges, producing datasets that cover diverse scenarios and thereby scaling policy performance. Experimental results demonstrate that DexFlyWheel generates over 2,000 diverse demonstrations across four challenging tasks. Policies trained on our dataset achieve an average success rate of 81.9\% on the challenge test sets and successfully transfer to the real world through digital twin, achieving a 78.3\% success rate on dual-arm lift tasks.

Paper Structure

This paper contains 34 sections, 5 equations, 9 figures, 11 tables.

Figures (9)

  • Figure 1: Scaling dexterous manipulation data — DexFlyWheel generates diverse, high-quality dexterous manipulation data for challenging tasks. Our generated dataset enables policies to generalize to unseen scenarios and successfully transfer to the real world.
  • Figure 2: DexFlyWheel Framework Overview. The framework has two stages: a warm-up stage (left) and a self-improving data flywheel stage (right). In the warm-up stage, seed demonstrations from VR teleoperation are augmented to form the initial dataset $\mathcal{D}_1$. The data flywheel stage operates as a closed-loop cycle with four key components:(1) base policy $\pi_{\text{base}}$ training to capture human-like behaviors, (2) residual policy $\pi_{\text{res}}$ training to enhance generalization, (3) combined policy $\pi_{\text{combined}}$ rollouts to generate new trajectories, and (4) data augmentation to further diversify the dataset. As the flywheel iterates, both data diversity and policy capability continuously improve.
  • Figure 3: Experiment Setup. Taking the dual-arm robot system as an example, (a) Our simulation environment. (b) Object diversity expansion across iterations, progressing from a single object (i=1) to geometrically similar objects (i=2) and diverse geometries and physical properties objects (i=3). (c) Spatial diversity, showing the spatial arrangements. (d) Environment diversity, including variations in lighting conditions and tabletop appearances. (e) Real-world environment.
  • Figure 4: Ablation Study. Quantitative contribution of each module in DexFlyWheel across four manipulation tasks.
  • Figure 5: Comparison of Object Diversity. Our method successfully handles objects with diverse geometries, sizes, and categories.
  • ...and 4 more figures