Table of Contents
Fetching ...

Physically Consistent Humanoid Loco-Manipulation using Latent Diffusion Models

Ilyass Taouil, Haizhou Zhao, Angela Dai, Majid Khadiv

TL;DR

The paper tackles the challenge of planning physically plausible, long-horizon loco-manipulation for humanoids. It introduces a pipeline that leverages latent diffusion models to synthesize 2D RGB human-object interaction scenes and extracts 3D contact locations and robot configurations to guide a full-body trajectory optimization. This yields a first-of-its-kind integration of LDM-based planning cues with centroidal dynamics and contact-rich TO, demonstrated in two challenging simulation scenarios with ablations validating geometry-aware contact transfer and keyframe-guided warm starts. The approach offers a practical path toward scalable, visually-guided humanoid planning with potential impact on real-world long-horizon manipulation tasks.

Abstract

This paper uses the capabilities of latent diffusion models (LDMs) to generate realistic RGB human-object interaction scenes to guide humanoid loco-manipulation planning. To do so, we extract from the generated images both the contact locations and robot configurations that are then used inside a whole-body trajectory optimization (TO) formulation to generate physically consistent trajectories for humanoids. We validate our full pipeline in simulation for different long-horizon loco-manipulation scenarios and perform an extensive analysis of the proposed contact and robot configuration extraction pipeline. Our results show that using the information extracted from LDMs, we can generate physically consistent trajectories that require long-horizon reasoning.

Physically Consistent Humanoid Loco-Manipulation using Latent Diffusion Models

TL;DR

The paper tackles the challenge of planning physically plausible, long-horizon loco-manipulation for humanoids. It introduces a pipeline that leverages latent diffusion models to synthesize 2D RGB human-object interaction scenes and extracts 3D contact locations and robot configurations to guide a full-body trajectory optimization. This yields a first-of-its-kind integration of LDM-based planning cues with centroidal dynamics and contact-rich TO, demonstrated in two challenging simulation scenarios with ablations validating geometry-aware contact transfer and keyframe-guided warm starts. The approach offers a practical path toward scalable, visually-guided humanoid planning with potential impact on real-world long-horizon manipulation tasks.

Abstract

This paper uses the capabilities of latent diffusion models (LDMs) to generate realistic RGB human-object interaction scenes to guide humanoid loco-manipulation planning. To do so, we extract from the generated images both the contact locations and robot configurations that are then used inside a whole-body trajectory optimization (TO) formulation to generate physically consistent trajectories for humanoids. We validate our full pipeline in simulation for different long-horizon loco-manipulation scenarios and perform an extensive analysis of the proposed contact and robot configuration extraction pipeline. Our results show that using the information extracted from LDMs, we can generate physically consistent trajectories that require long-horizon reasoning.

Paper Structure

This paper contains 26 sections, 11 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: A loco-manipulation task achieved with our approach.
  • Figure 2: Pipeline overview.
  • Figure 3: Contact extraction procedure.
  • Figure 4: Keyframe extraction procedure.
  • Figure 5: Collision penetrations comparison between the TO output using our proposed pipeline (blue) and a naive approach (red) for both the laundry scenario (S1) and the trolley scenario (S2) with and without collision penalties enabled.
  • ...and 1 more figures