Physically Consistent Humanoid Loco-Manipulation using Latent Diffusion Models
Ilyass Taouil, Haizhou Zhao, Angela Dai, Majid Khadiv
TL;DR
The paper tackles the challenge of planning physically plausible, long-horizon loco-manipulation for humanoids. It introduces a pipeline that leverages latent diffusion models to synthesize 2D RGB human-object interaction scenes and extracts 3D contact locations and robot configurations to guide a full-body trajectory optimization. This yields a first-of-its-kind integration of LDM-based planning cues with centroidal dynamics and contact-rich TO, demonstrated in two challenging simulation scenarios with ablations validating geometry-aware contact transfer and keyframe-guided warm starts. The approach offers a practical path toward scalable, visually-guided humanoid planning with potential impact on real-world long-horizon manipulation tasks.
Abstract
This paper uses the capabilities of latent diffusion models (LDMs) to generate realistic RGB human-object interaction scenes to guide humanoid loco-manipulation planning. To do so, we extract from the generated images both the contact locations and robot configurations that are then used inside a whole-body trajectory optimization (TO) formulation to generate physically consistent trajectories for humanoids. We validate our full pipeline in simulation for different long-horizon loco-manipulation scenarios and perform an extensive analysis of the proposed contact and robot configuration extraction pipeline. Our results show that using the information extracted from LDMs, we can generate physically consistent trajectories that require long-horizon reasoning.
