It Takes Two: Learning Interactive Whole-Body Control Between Humanoid Robots
Zuhong Liu, Junhao Ge, Minhao Xiong, Jiahao Gu, Bowei Tang, Wei Jing, Siheng Chen
TL;DR
The paper tackles dual-humanoid motion imitation, addressing the isolation issue in single-humanoid approaches. It introduces Harmanoid, a two-stage framework combining contact-aware motion retargeting with an interaction-driven motion controller to preserve interaction geometry and enforce physically plausible, synchronized behaviors. Experimental results on the Inter-X dataset show improved success rates, reduced interpenetrations, and more coherent coordination than single-humanoid baselines; ablations demonstrate the contributions of contact and interaction rewards plus curriculum. The work advances practical dual-robot collaboration and provides a pathway toward real-world deployment with perception and communication integration.
Abstract
The true promise of humanoid robotics lies beyond single-agent autonomy: two or more humanoids must engage in physically grounded, socially meaningful whole-body interactions that echo the richness of human social interaction. However, single-humanoid methods suffer from the isolation issue, ignoring inter-agent dynamics and causing misaligned contacts, interpenetrations, and unrealistic motions. To address this, we present Harmanoid , a dual-humanoid motion imitation framework that transfers interacting human motions to two robots while preserving both kinematic fidelity and physical realism. Harmanoid comprises two key components: (i) contact-aware motion retargeting, which restores inter-body coordination by aligning SMPL contacts with robot vertices, and (ii) interaction-driven motion controller, which leverages interaction-specific rewards to enforce coordinated keypoints and physically plausible contacts. By explicitly modeling inter-agent contacts and interaction-aware dynamics, Harmanoid captures the coupled behaviors between humanoids that single-humanoid frameworks inherently overlook. Experiments demonstrate that Harmanoid significantly improves interactive motion imitation, surpassing existing single-humanoid frameworks that largely fail in such scenarios.
