Table of Contents
Fetching ...

Hand-Object Interaction Controller (HOIC): Deep Reinforcement Learning for Reconstructing Interactions with Physics

Haoyu Hu, Xinyu Yi, Zhe Cao, Jun-Hai Yong, Feng Xu

TL;DR

HOIC tackles the challenge of reconstructing physically plausible hand-object interactions from limited single-view RGBD data in real time. It combines object compensation control with a surface contact model within a PPO-based imitation learning framework, employing mimic and physics rewards to jointly guide hand motion and object dynamics. The approach achieves comparable tracking accuracy to a vision-based kinematic baseline while substantially improving physical plausibility, reducing penetration, and smoothing interaction motion across three objects in a real-time pipeline. This work advances physics-aware HOI reconstruction and lays groundwork for more realistic human–robot interaction under constrained sensing conditions.

Abstract

Hand manipulating objects is an important interaction motion in our daily activities. We faithfully reconstruct this motion with a single RGBD camera by a novel deep reinforcement learning method to leverage physics. Firstly, we propose object compensation control which establishes direct object control to make the network training more stable. Meanwhile, by leveraging the compensation force and torque, we seamlessly upgrade the simple point contact model to a more physical-plausible surface contact model, further improving the reconstruction accuracy and physical correctness. Experiments indicate that without involving any heuristic physical rules, this work still successfully involves physics in the reconstruction of hand-object interactions which are complex motions hard to imitate with deep reinforcement learning. Our code and data are available at https://github.com/hu-hy17/HOIC.

Hand-Object Interaction Controller (HOIC): Deep Reinforcement Learning for Reconstructing Interactions with Physics

TL;DR

HOIC tackles the challenge of reconstructing physically plausible hand-object interactions from limited single-view RGBD data in real time. It combines object compensation control with a surface contact model within a PPO-based imitation learning framework, employing mimic and physics rewards to jointly guide hand motion and object dynamics. The approach achieves comparable tracking accuracy to a vision-based kinematic baseline while substantially improving physical plausibility, reducing penetration, and smoothing interaction motion across three objects in a real-time pipeline. This work advances physics-aware HOI reconstruction and lays groundwork for more realistic human–robot interaction under constrained sensing conditions.

Abstract

Hand manipulating objects is an important interaction motion in our daily activities. We faithfully reconstruct this motion with a single RGBD camera by a novel deep reinforcement learning method to leverage physics. Firstly, we propose object compensation control which establishes direct object control to make the network training more stable. Meanwhile, by leveraging the compensation force and torque, we seamlessly upgrade the simple point contact model to a more physical-plausible surface contact model, further improving the reconstruction accuracy and physical correctness. Experiments indicate that without involving any heuristic physical rules, this work still successfully involves physics in the reconstruction of hand-object interactions which are complex motions hard to imitate with deep reinforcement learning. Our code and data are available at https://github.com/hu-hy17/HOIC.
Paper Structure (18 sections, 16 equations, 5 figures, 3 tables)

This paper contains 18 sections, 16 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: System Overview. Our system takes the kinematic tracking results of hand-object interaction $\{\boldsymbol{\tilde{q}}_{t}, \boldsymbol{\tilde{o}}_{t}\}$ as input and outputs a refined version $\{\boldsymbol{q}_{t}, \boldsymbol{o}_{t}\}$. The policy network $\pi$ first takes $\{\boldsymbol{q}_{t}, \boldsymbol{o}_{t}\}$ and $\{\boldsymbol{\tilde{q}}_{t+i}, \boldsymbol{\tilde{o}}_{t+i}\}$ as input to predict the control signals $\{\boldsymbol{\tau}_t, \boldsymbol{f}_t^c, \boldsymbol{\tau}_t^c\}$ which are fed into a physical simulator to obtain $\{\boldsymbol{q}_{t+1}, \boldsymbol{o}_{t+1}\}$. In the training process, the compensation force $\{\boldsymbol{f}_t^c, \boldsymbol{\tau}_t^c\}$ is applied to upgrade the contact model from point to surface contact in a Surface Contact Modeling step and the residual is used to construct the physics reward $r_t^{\text{phys}}$. Meanwhile, a mimic reward $r_t^{\text{mimic}}$ is also used to train the policy network $\pi$.
  • Figure 2: Our physical hand model (Left), object mesh reconstructed from kinematic tracking (Middle), and convex object collision mesh (Right).
  • Figure 3: Qualitative comparison with zhang2021single and hu2022physical
  • Figure 4: Training curve of our method with (orange) and without (blue) object compensation control. The Y-axis represents the total rewards obtained by the policy in all episodes within one epoch. The results are normalized by the maximum and minimum values of the total rewards across all epochs.
  • Figure 5: Qualitative evaluation of our surface contact model compared with two other designs: simply requiring minor compensation forces (w/o Contact Model), and a point contact model. Our surface contact model offers more precise modeling of hand-object interaction, particularly in challenging scenarios involving rapidly changing contact status (Top). Additionally, it incorporates torsional and rolling friction torque, enabling more precise imitation of the rotational motion of objects (Bottom).