Modeling Dynamic Hand-Object Interactions with Applications to Human-Robot Handovers
Sammy Christen
TL;DR
This dissertation advances the modeling of dynamic hand-object interactions by introducing two core tasks: dynamic grasp synthesis (D-Grasp) and bi-manual articulation (ArtiGrasp) using reinforcement learning within a physics-based simulation. It then applies these HOI synthesis capabilities to human-to-robot handovers, proposing end-to-end training frameworks that leverage simulated human-in-the-loop data and synthetic generation (SynH2R, SynH2R) to scale training diversity. A two-stage teacher-student framework enables sim-to-real transfer for handover policies, and user studies show that policies trained on purely synthetic data can match or exceed those trained on real motion data, underscoring synthetic data's feasibility for scalable humanoid-robot collaboration. The work also presents SynH2R to generate large-scale synthetic handover motions, enabling robust generalization to unseen objects and motions, and demonstrates promising sim-to-real transfer in real robot experiments. Overall, the dissertation demonstrates physically plausible 4D HOI synthesis and its practical utility for scalable, human-aware robotic systems in embodied AI, HRI training, and data augmentation for perception and planning.
Abstract
Humans frequently grasp, manipulate, and move objects. Interactive systems assist humans in these tasks, enabling applications in Embodied AI, human-robot interaction, and virtual reality. However, current methods in hand-object synthesis often neglect dynamics and focus on generating static grasps. The first part of this dissertation introduces dynamic grasp synthesis, where a hand grasps and moves an object to a target pose. We approach this task using physical simulation and reinforcement learning. We then extend this to bimanual manipulation and articulated objects, requiring fine-grained coordination between hands. In the second part of this dissertation, we study human-to-robot handovers. We integrate captured human motion into simulation and introduce a student-teacher framework that adapts to human behavior and transfers from sim to real. To overcome data scarcity, we generate synthetic interactions, increasing training diversity by 100x. Our user study finds no difference between policies trained on synthetic vs. real motions.
