SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance

Peishan Cong; Ziyi Wang; Yuexin Ma; Xiangyu Yue

SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance

Peishan Cong, Ziyi Wang, Yuexin Ma, Xiangyu Yue

TL;DR

SemGeoMo tackles dynamic contextual human motion generation by integrating textual semantic guidance with hierarchical geometric cues from sequential point clouds. The approach introduces an automated LLM Annotator to generate coarse and fine textual guidance, a SemGeo Hierarchical Guidance stage with dual-branch transformers to produce coarse hand-joint and affordance cues, and a SemGeo-guided Motion Generation stage using Motion ControlNet for high-fidelity full-body motions. Multi-level conditioning combines text features (via CLIP/LONGCLIP) with geometry features (joint positions and affordances) through a SemGeo Condition Module and mutual cross-attention, followed by loss-guided refinement and L-BFGS-based posterior updates. Experiments across FullBodyManipulation, BEHAVE, IMHD$^2$, and unseen HoDome demonstrate state-of-the-art performance and strong generalization to unseen objects and human–human interactions, while ablations confirm the value of textual guidance and coarse-to-fine geometric cues. The work offers a practical framework for realistic interactive motion generation with interpretable language descriptions, enabling improved human–robot interaction and immersive simulations.

Abstract

Generating reasonable and high-quality human interactive motions in a given dynamic environment is crucial for understanding, modeling, transferring, and applying human behaviors to both virtual and physical robots. In this paper, we introduce an effective method, SemGeoMo, for dynamic contextual human motion generation, which fully leverages the text-affordance-joint multi-level semantic and geometric guidance in the generation process, improving the semantic rationality and geometric correctness of generative motions. Our method achieves state-of-the-art performance on three datasets and demonstrates superior generalization capability for diverse interaction scenarios. The project page and code can be found at https://4dvlab.github.io/project_page/semgeomo/.

SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance

TL;DR

Abstract

SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)