Uni-Inter: Unifying 3D Human Motion Synthesis Across Diverse Interaction Contexts
Sheng Liu, Yuanzhi Liang, Jiepeng Wang, Sidan Du, Chi Zhang, Xuelong Li
TL;DR
Uni-Inter addresses the challenge of generating coherent human motion in compound interaction scenarios by unifying humans, objects, and scenes within a single 3D Representational space called the Unified Interactive Volume (UIV). Motion is modeled as joint-wise spatial distributions over the UIV, transforming generation into spatial inference and enabling robust reasoning about physical constraints, social dynamics, and task semantics. The approach employs a diffusion-based generator conditioned on text and UIV, together with UIV-aligned regularization and a multi-task training regime, achieving competitive or superior results across human-object, human-human, and human-scene benchmarks and demonstrating strong generalization to unseen entity combinations. This unified formulation offers scalable, context-aware motion synthesis for complex, real-world environments with potential applications in character animation, embodied AI, and interactive graphics.
Abstract
We present Uni-Inter, a unified framework for human motion generation that supports a wide range of interaction scenarios: including human-human, human-object, and human-scene-within a single, task-agnostic architecture. In contrast to existing methods that rely on task-specific designs and exhibit limited generalization, Uni-Inter introduces the Unified Interactive Volume (UIV), a volumetric representation that encodes heterogeneous interactive entities into a shared spatial field. This enables consistent relational reasoning and compound interaction modeling. Motion generation is formulated as joint-wise probabilistic prediction over the UIV, allowing the model to capture fine-grained spatial dependencies and produce coherent, context-aware behaviors. Experiments across three representative interaction tasks demonstrate that Uni-Inter achieves competitive performance and generalizes well to novel combinations of entities. These results suggest that unified modeling of compound interactions offers a promising direction for scalable motion synthesis in complex environments.
