Table of Contents
Fetching ...

SynH2R: Synthesizing Hand-Object Motions for Learning Human-to-Robot Handovers

Sammy Christen, Lan Feng, Wei Yang, Yu-Wei Chao, Otmar Hilliges, Jie Song

TL;DR

This work addresses the scalability bottleneck of motion capture in vision-based human-to-robot handovers by introducing a synthetic hand-object motion synthesis pipeline. A controllable grasp optimizer generates pre-grasp and grasp poses conditioned on the handover direction, which, together with an augmented D-Grasp policy, produces diverse handover motions for a large object set. The synthesized data is used to train a vision-based H2R policy in simulation, achieving competitive performance with mocap-based baselines and better generalization to unseen objects on a large synthetic test set, with successful sim-to-real transfer. The results suggest synthetic handover data can effectively replace or augment real mocap data for training robotic handover policies, enabling scalable exploration of object types and grasp strategies. The study also introduces a substantial synthetic test set to evaluate generalization across thousands of unknown objects, highlighting the practical impact for real-world HRI scenarios.

Abstract

Vision-based human-to-robot handover is an important and challenging task in human-robot interaction. Recent work has attempted to train robot policies by interacting with dynamic virtual humans in simulated environments, where the policies can later be transferred to the real world. However, a major bottleneck is the reliance on human motion capture data, which is expensive to acquire and difficult to scale to arbitrary objects and human grasping motions. In this paper, we introduce a framework that can generate plausible human grasping motions suitable for training the robot. To achieve this, we propose a hand-object synthesis method that is designed to generate handover-friendly motions similar to humans. This allows us to generate synthetic training and testing data with 100x more objects than previous work. In our experiments, we show that our method trained purely with synthetic data is competitive with state-of-the-art methods that rely on real human motion data both in simulation and on a real system. In addition, we can perform evaluations on a larger scale compared to prior work. With our newly introduced test set, we show that our model can better scale to a large variety of unseen objects and human motions compared to the baselines. Project page: https://eth-ait.github.io/synthetic-handovers/

SynH2R: Synthesizing Hand-Object Motions for Learning Human-to-Robot Handovers

TL;DR

This work addresses the scalability bottleneck of motion capture in vision-based human-to-robot handovers by introducing a synthetic hand-object motion synthesis pipeline. A controllable grasp optimizer generates pre-grasp and grasp poses conditioned on the handover direction, which, together with an augmented D-Grasp policy, produces diverse handover motions for a large object set. The synthesized data is used to train a vision-based H2R policy in simulation, achieving competitive performance with mocap-based baselines and better generalization to unseen objects on a large synthetic test set, with successful sim-to-real transfer. The results suggest synthetic handover data can effectively replace or augment real mocap data for training robotic handover policies, enabling scalable exploration of object types and grasp strategies. The study also introduces a substantial synthetic test set to evaluate generalization across thousands of unknown objects, highlighting the practical impact for real-world HRI scenarios.

Abstract

Vision-based human-to-robot handover is an important and challenging task in human-robot interaction. Recent work has attempted to train robot policies by interacting with dynamic virtual humans in simulated environments, where the policies can later be transferred to the real world. However, a major bottleneck is the reliance on human motion capture data, which is expensive to acquire and difficult to scale to arbitrary objects and human grasping motions. In this paper, we introduce a framework that can generate plausible human grasping motions suitable for training the robot. To achieve this, we propose a hand-object synthesis method that is designed to generate handover-friendly motions similar to humans. This allows us to generate synthetic training and testing data with 100x more objects than previous work. In our experiments, we show that our method trained purely with synthetic data is competitive with state-of-the-art methods that rely on real human motion data both in simulation and on a real system. In addition, we can perform evaluations on a larger scale compared to prior work. With our newly introduced test set, we show that our model can better scale to a large variety of unseen objects and human motions compared to the baselines. Project page: https://eth-ait.github.io/synthetic-handovers/
Paper Structure (17 sections, 1 equation, 2 figures, 3 tables)

This paper contains 17 sections, 1 equation, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Overview of our framework. We train a robot to perform human-to-robot handovers using synthetic human motions. We transfer to a real robot and evaluate on a large synthetic test set of unseen objects and human motions.
  • Figure 2: Method Overview. Our framework contains a handover motion generation stage and a H2R handover training stage.