Table of Contents
Fetching ...

Scalable Dexterous Robot Learning with AR-based Remote Human-Robot Interactions

Yicheng Yang, Ruijiao Li, Lifeng Wang, Shuai Zheng, Shunzheng Ma, Keyu Zhang, Tuoyu Sun, Chenyun Dai, Jie Ding, Zhuo Zou

TL;DR

Dexterous manipulation with high-DoF arms is data-hungry and prone to policy collapse. The authors address this by using AR-based remote human demonstrations to bootstrap learning, applying a two-phase framework that first performs behavior cloning (BC) pretraining and then trains with contrastive-learning augmented RL. The RL component builds on SAC with two critics and a projection head guided by a contrastive loss to align agent and expert actions, achieving faster convergence and higher task success than PPO or SAC baselines, as demonstrated in PyBullet and real-world trials. The approach also employs an event-driven augmented reward for safety, and ablations show BC pretraining reduces data needs while the contrastive loss mitigates policy collapse, indicating strong practical potential for scalable, data-efficient dexterous manipulation with AR teleoperation.

Abstract

This paper focuses on the scalable robot learning for manipulation in the dexterous robot arm-hand systems, where the remote human-robot interactions via augmented reality (AR) are established to collect the expert demonstration data for improving efficiency. In such a system, we present a unified framework to address the general manipulation task problem. Specifically, the proposed method consists of two phases: i) In the first phase for pretraining, the policy is created in a behavior cloning (BC) manner, through leveraging the learning data from our AR-based remote human-robot interaction system; ii) In the second phase, a contrastive learning empowered reinforcement learning (RL) method is developed to obtain more efficient and robust policy than the BC, and thus a projection head is designed to accelerate the learning progress. An event-driven augmented reward is adopted for enhancing the safety. To validate the proposed method, both the physics simulations via PyBullet and real-world experiments are carried out. The results demonstrate that compared to the classic proximal policy optimization and soft actor-critic policies, our method not only significantly speeds up the inference, but also achieves much better performance in terms of the success rate for fulfilling the manipulation tasks. By conducting the ablation study, it is confirmed that the proposed RL with contrastive learning overcomes policy collapse. Supplementary demonstrations are available at https://cyberyyc.github.io/.

Scalable Dexterous Robot Learning with AR-based Remote Human-Robot Interactions

TL;DR

Dexterous manipulation with high-DoF arms is data-hungry and prone to policy collapse. The authors address this by using AR-based remote human demonstrations to bootstrap learning, applying a two-phase framework that first performs behavior cloning (BC) pretraining and then trains with contrastive-learning augmented RL. The RL component builds on SAC with two critics and a projection head guided by a contrastive loss to align agent and expert actions, achieving faster convergence and higher task success than PPO or SAC baselines, as demonstrated in PyBullet and real-world trials. The approach also employs an event-driven augmented reward for safety, and ablations show BC pretraining reduces data needs while the contrastive loss mitigates policy collapse, indicating strong practical potential for scalable, data-efficient dexterous manipulation with AR teleoperation.

Abstract

This paper focuses on the scalable robot learning for manipulation in the dexterous robot arm-hand systems, where the remote human-robot interactions via augmented reality (AR) are established to collect the expert demonstration data for improving efficiency. In such a system, we present a unified framework to address the general manipulation task problem. Specifically, the proposed method consists of two phases: i) In the first phase for pretraining, the policy is created in a behavior cloning (BC) manner, through leveraging the learning data from our AR-based remote human-robot interaction system; ii) In the second phase, a contrastive learning empowered reinforcement learning (RL) method is developed to obtain more efficient and robust policy than the BC, and thus a projection head is designed to accelerate the learning progress. An event-driven augmented reward is adopted for enhancing the safety. To validate the proposed method, both the physics simulations via PyBullet and real-world experiments are carried out. The results demonstrate that compared to the classic proximal policy optimization and soft actor-critic policies, our method not only significantly speeds up the inference, but also achieves much better performance in terms of the success rate for fulfilling the manipulation tasks. By conducting the ablation study, it is confirmed that the proposed RL with contrastive learning overcomes policy collapse. Supplementary demonstrations are available at https://cyberyyc.github.io/.
Paper Structure (8 sections, 12 equations, 8 figures, 3 tables, 1 algorithm)

This paper contains 8 sections, 12 equations, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: Illustration of a learning data collection system via AR: The expert wears the AR headset to teleoperate the dexterous robot for manipulation tasks, and the expert demonstration data are collected for model training.
  • Figure 2: Design structure of the proposed algorithm: Differing from the existing algorithms, the expert demonstration data are leveraged for behavior cloning based pretraining and augmented RL policy with the help of projection head.
  • Figure 3: The proposed projection head for improving the sample-efficiency and preventing the policy collapse in the actor network.
  • Figure 4: Dexterous robot manipulation tasks in the simulation environment and real-world.
  • Figure 5: The rollouts of object grasping via the proposed algorithm: The functional grasps are achieved in the end.
  • ...and 3 more figures