Table of Contents
Fetching ...

HAIC: Humanoid Agile Object Interaction Control via Dynamics-Aware World Model

Dongting Li, Xingyu Chen, Qianyang Wu, Bo Chen, Sikai Wu, Hanyu Wu, Guoyao Zhang, Liang Li, Mingliang Zhou, Diyun Xiang, Jianzhu Ma, Qiang Zhang, Renjing Xu

TL;DR

HAIC tackles the challenge of robust humanoid interaction with underactuated objects under visual occlusion by introducing a dynamics-aware world model that predicts high-order object states from proprioception and explicitly projects these states onto a static geometric prior to form a dynamic occupancy map. The framework comprises an Object Adapter, Explicit Geometric Projection, and a Privilege Adapter, integrated through a two-stage asymmetric distillation training regime with EMA stabilization to bridge sim-to-real gaps. Key contributions include the Dynamics-Aware World Model, Asymmetric Adaptive Distillation, and a multimodal contact reward, validated by real-world experiments showing state-of-the-art performance on skateboarding and cart manipulation, as well as multi-terrain carrying and long-horizon tasks. The results demonstrate proactive inertia compensation and improved stability, enabling zero-shot generalization across object size, terrain orientation, and external perturbations, which is significant for deploying agile, sensor-limited humanoids in unstructured environments. Throughout, all state and objective terms are framed with $...$ notation to reflect the quantitative emphasis of the evaluation metrics such as $E_{mpbpe}$, $E_{mpboe}$, and $E_{mpjpe}$.

Abstract

Humanoid robots show promise for complex whole-body tasks in unstructured environments. Although Human-Object Interaction (HOI) has advanced, most methods focus on fully actuated objects rigidly coupled to the robot, ignoring underactuated objects with independent dynamics and non-holonomic constraints. These introduce control challenges from coupling forces and occlusions. We present HAIC, a unified framework for robust interaction across diverse object dynamics without external state estimation. Our key contribution is a dynamics predictor that estimates high-order object states (velocity, acceleration) solely from proprioceptive history. These predictions are projected onto static geometric priors to form a spatially grounded dynamic occupancy map, enabling the policy to infer collision boundaries and contact affordances in blind spots. We use asymmetric fine-tuning, where a world model continuously adapts to the student policy's exploration, ensuring robust state estimation under distribution shifts. Experiments on a humanoid robot show HAIC achieves high success rates in agile tasks (skateboarding, cart pushing/pulling under various loads) by proactively compensating for inertial perturbations, and also masters multi-object long-horizon tasks like carrying a box across varied terrain by predicting the dynamics of multiple objects.

HAIC: Humanoid Agile Object Interaction Control via Dynamics-Aware World Model

TL;DR

HAIC tackles the challenge of robust humanoid interaction with underactuated objects under visual occlusion by introducing a dynamics-aware world model that predicts high-order object states from proprioception and explicitly projects these states onto a static geometric prior to form a dynamic occupancy map. The framework comprises an Object Adapter, Explicit Geometric Projection, and a Privilege Adapter, integrated through a two-stage asymmetric distillation training regime with EMA stabilization to bridge sim-to-real gaps. Key contributions include the Dynamics-Aware World Model, Asymmetric Adaptive Distillation, and a multimodal contact reward, validated by real-world experiments showing state-of-the-art performance on skateboarding and cart manipulation, as well as multi-terrain carrying and long-horizon tasks. The results demonstrate proactive inertia compensation and improved stability, enabling zero-shot generalization across object size, terrain orientation, and external perturbations, which is significant for deploying agile, sensor-limited humanoids in unstructured environments. Throughout, all state and objective terms are framed with notation to reflect the quantitative emphasis of the evaluation metrics such as , , and .

Abstract

Humanoid robots show promise for complex whole-body tasks in unstructured environments. Although Human-Object Interaction (HOI) has advanced, most methods focus on fully actuated objects rigidly coupled to the robot, ignoring underactuated objects with independent dynamics and non-holonomic constraints. These introduce control challenges from coupling forces and occlusions. We present HAIC, a unified framework for robust interaction across diverse object dynamics without external state estimation. Our key contribution is a dynamics predictor that estimates high-order object states (velocity, acceleration) solely from proprioceptive history. These predictions are projected onto static geometric priors to form a spatially grounded dynamic occupancy map, enabling the policy to infer collision boundaries and contact affordances in blind spots. We use asymmetric fine-tuning, where a world model continuously adapts to the student policy's exploration, ensuring robust state estimation under distribution shifts. Experiments on a humanoid robot show HAIC achieves high success rates in agile tasks (skateboarding, cart pushing/pulling under various loads) by proactively compensating for inertial perturbations, and also masters multi-object long-horizon tasks like carrying a box across varied terrain by predicting the dynamics of multiple objects.
Paper Structure (53 sections, 9 equations, 7 figures, 13 tables)

This paper contains 53 sections, 9 equations, 7 figures, 13 tables.

Figures (7)

  • Figure 1: HAIC excels at complex interactions, particularly with underactuated objects, and significantly outperforms the baseline.
  • Figure 2: Overview of our Dynamics-aware World Model. It predicts object dynamics from proprioception and explicitly projects them onto a static geometric prior for reconstructing the privileged information.
  • Figure 3: Framework overview. We train policies in the simulation from scratch. The framework includes a privileged teacher and a dynamics-aware student. The student policy utilizes the learned world model to perform robust interaction tasks such as skateboarding on a real humanoid robot.
  • Figure 4: Multiple Objects Contact Guidance Strategy.
  • Figure 5: Real-world performance comparison with the baseline across various tasks. With the specifically designed dynamics-aware world model, HAIC maintains robust stability throughout the interaction, whereas the baseline suffers from balance failures and trajectory drift.
  • ...and 2 more figures