Table of Contents
Fetching ...

InterReal: A Unified Physics-Based Imitation Framework for Learning Human-Object Interaction Skills

Dayang Liang, Yuhang Lin, Xinzhe Liu, Jiyuan Shi, Yunlong Liu, Chenjia Bai

TL;DR

This work develops InterReal, a unified physics-based imitation learning framework for Real-world human-object Interaction (HOI) control that enables humanoid robots to track HOI reference motions, facilitating the learning of fine-grained interactive skills and their deployment in real-world settings.

Abstract

Interaction is one of the core abilities of humanoid robots. However, most existing frameworks focus on non-interactive whole-body control, which limits their practical applicability. In this work, we develop InterReal, a unified physics-based imitation learning framework for Real-world human-object Interaction (HOI) control. InterReal enables humanoid robots to track HOI reference motions, facilitating the learning of fine-grained interactive skills and their deployment in real-world settings. Within this framework, we first introduce a HOI motion data augmentation scheme with hand-object contact constraints, and utilize the augmented motions to improve policy stability under object perturbations. Second, we propose an automatic reward learner to address the challenge of large-scale reward shaping. A meta-policy guided by critical tracking error metrics explores and allocates reward signals to the low-level reinforcement learning objective, which enables more effective learning of interactive policies. Experiments on HOI tasks of box-picking and box-pushing demonstrate that InterReal achieves the best tracking accuracy and the highest task success rate compared to recent baselines. Furthermore, we validate the framework on the real-world robot Unitree G1, which demonstrates its practical effectiveness and robustness beyond simulation.

InterReal: A Unified Physics-Based Imitation Framework for Learning Human-Object Interaction Skills

TL;DR

This work develops InterReal, a unified physics-based imitation learning framework for Real-world human-object Interaction (HOI) control that enables humanoid robots to track HOI reference motions, facilitating the learning of fine-grained interactive skills and their deployment in real-world settings.

Abstract

Interaction is one of the core abilities of humanoid robots. However, most existing frameworks focus on non-interactive whole-body control, which limits their practical applicability. In this work, we develop InterReal, a unified physics-based imitation learning framework for Real-world human-object Interaction (HOI) control. InterReal enables humanoid robots to track HOI reference motions, facilitating the learning of fine-grained interactive skills and their deployment in real-world settings. Within this framework, we first introduce a HOI motion data augmentation scheme with hand-object contact constraints, and utilize the augmented motions to improve policy stability under object perturbations. Second, we propose an automatic reward learner to address the challenge of large-scale reward shaping. A meta-policy guided by critical tracking error metrics explores and allocates reward signals to the low-level reinforcement learning objective, which enables more effective learning of interactive policies. Experiments on HOI tasks of box-picking and box-pushing demonstrate that InterReal achieves the best tracking accuracy and the highest task success rate compared to recent baselines. Furthermore, we validate the framework on the real-world robot Unitree G1, which demonstrates its practical effectiveness and robustness beyond simulation.
Paper Structure (20 sections, 10 equations, 5 figures, 2 tables)

This paper contains 20 sections, 10 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Two groups of live photos from a real-world deployment of the challenging interaction tasks. Top group: The robot visually perceives the box's posture, simultaneously picking, walking, and putting the high-density box down. Bottom group: The robot needs to bend slightly and continuously push the box forward. During the interaction, the policy can be adjusted in real-time based on unfavorable box postures to ensure HOI task completion.
  • Figure 2: Overall framework of InterReal. InterReal consists of three main components: motion data preprocessing, multi-motion multi-environment learning, and deployment. It enables retargeting HOI motions into trainable motions with G1 robot shapes, achieves accurate motion tracking in complex HOI training settings, and ultimately supports real-world deployment.
  • Figure 3: Comparison of tracking accuracy among InterReal and baselines on the box-picking task.
  • Figure 4: Ablation results for the internal coefficient $\delta$ of the meta-learning on the box-picking task.
  • Figure 5: Adaptive curves for reward-related weight coefficients.