Scalable Dexterous Robot Learning with AR-based Remote Human-Robot Interactions
Yicheng Yang, Ruijiao Li, Lifeng Wang, Shuai Zheng, Shunzheng Ma, Keyu Zhang, Tuoyu Sun, Chenyun Dai, Jie Ding, Zhuo Zou
TL;DR
Dexterous manipulation with high-DoF arms is data-hungry and prone to policy collapse. The authors address this by using AR-based remote human demonstrations to bootstrap learning, applying a two-phase framework that first performs behavior cloning (BC) pretraining and then trains with contrastive-learning augmented RL. The RL component builds on SAC with two critics and a projection head guided by a contrastive loss to align agent and expert actions, achieving faster convergence and higher task success than PPO or SAC baselines, as demonstrated in PyBullet and real-world trials. The approach also employs an event-driven augmented reward for safety, and ablations show BC pretraining reduces data needs while the contrastive loss mitigates policy collapse, indicating strong practical potential for scalable, data-efficient dexterous manipulation with AR teleoperation.
Abstract
This paper focuses on the scalable robot learning for manipulation in the dexterous robot arm-hand systems, where the remote human-robot interactions via augmented reality (AR) are established to collect the expert demonstration data for improving efficiency. In such a system, we present a unified framework to address the general manipulation task problem. Specifically, the proposed method consists of two phases: i) In the first phase for pretraining, the policy is created in a behavior cloning (BC) manner, through leveraging the learning data from our AR-based remote human-robot interaction system; ii) In the second phase, a contrastive learning empowered reinforcement learning (RL) method is developed to obtain more efficient and robust policy than the BC, and thus a projection head is designed to accelerate the learning progress. An event-driven augmented reward is adopted for enhancing the safety. To validate the proposed method, both the physics simulations via PyBullet and real-world experiments are carried out. The results demonstrate that compared to the classic proximal policy optimization and soft actor-critic policies, our method not only significantly speeds up the inference, but also achieves much better performance in terms of the success rate for fulfilling the manipulation tasks. By conducting the ablation study, it is confirmed that the proposed RL with contrastive learning overcomes policy collapse. Supplementary demonstrations are available at https://cyberyyc.github.io/.
