Towards Immersive Human-X Interaction: A Real-Time Framework for Physically Plausible Motion Synthesis
Kaiyang Ji, Ye Shi, Zichen Jin, Kangyi Chen, Lan Xu, Yuexin Ma, Jingyi Yu, Jingya Wang
TL;DR
The paper tackles the challenge of real-time, physically plausible interactions between humans and diverse agents (avatars, humanoids, robots) in immersive settings. It introduces Human-X, a real-time auto-regressive action-reaction diffusion framework coupled with an actor-aware tracking policy to ensure safety, physical realism, and responsiveness. Through diffusion-based reaction generation, reactor-centric representations, and a physics-tracking policy, the method achieves superior motion quality, continuity, and interaction realism on Inter-X and InterHuman datasets, and is demonstrated in VR and human-robot interface scenarios. The work advances practical human-machine collaboration by providing a latency-friendly, physics-consistent synthesis pipeline with extensive ablations and user evaluations supporting its effectiveness.
Abstract
Real-time synthesis of physically plausible human interactions remains a critical challenge for immersive VR/AR systems and humanoid robotics. While existing methods demonstrate progress in kinematic motion generation, they often fail to address the fundamental tension between real-time responsiveness, physical feasibility, and safety requirements in dynamic human-machine interactions. We introduce Human-X, a novel framework designed to enable immersive and physically plausible human interactions across diverse entities, including human-avatar, human-humanoid, and human-robot systems. Unlike existing approaches that focus on post-hoc alignment or simplified physics, our method jointly predicts actions and reactions in real-time using an auto-regressive reaction diffusion planner, ensuring seamless synchronization and context-aware responses. To enhance physical realism and safety, we integrate an actor-aware motion tracking policy trained with reinforcement learning, which dynamically adapts to interaction partners' movements while avoiding artifacts like foot sliding and penetration. Extensive experiments on the Inter-X and InterHuman datasets demonstrate significant improvements in motion quality, interaction continuity, and physical plausibility over state-of-the-art methods. Our framework is validated in real-world applications, including virtual reality interface for human-robot interaction, showcasing its potential for advancing human-robot collaboration.
