TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation
Yabiao Wang, Shuo Wang, Jiangning Zhang, Ke Fan, Jiafu Wu, Zhucun Xue, Yong Liu
TL;DR
TIMotion addresses the challenge of two-person motion generation by embedding temporal dynamics and inter-person interactions within a general MetaMotion framework. It introduces three core components—Causal Interactive Injection, Role-Evolving Scanning, and Localized Pattern Amplification—to capture causal interactions, dynamic role switches, and short-term motion patterns, respectively, and is compatible with Transformer, Mamba, and RWKV backbones. Extensive experiments on InterHuman and InterX show TIMotion achieves state-of-the-art or competitive results with improved efficiency and editability, validated by both quantitative metrics and qualitative assessments. This approach provides a scalable, flexible paradigm for realistic human-human motion generation and editing with potential applications in animation, robotics, and human-comrobot collaboration.
Abstract
Human-human motion generation is essential for understanding humans as social beings. Current methods fall into two main categories: single-person-based methods and separate modeling-based methods. To delve into this field, we abstract the overall generation process into a general framework MetaMotion, which consists of two phases: temporal modeling and interaction mixing. For temporal modeling, the single-person-based methods concatenate two people into a single one directly, while the separate modeling-based methods skip the modeling of interaction sequences. The inadequate modeling described above resulted in sub-optimal performance and redundant model parameters. In this paper, we introduce TIMotion (Temporal and Interactive Modeling), an efficient and effective framework for human-human motion generation. Specifically, we first propose Causal Interactive Injection to model two separate sequences as a causal sequence leveraging the temporal and causal properties. Then we present Role-Evolving Scanning to adjust to the change in the active and passive roles throughout the interaction. Finally, to generate smoother and more rational motion, we design Localized Pattern Amplification to capture short-term motion patterns. Extensive experiments on InterHuman and InterX demonstrate that our method achieves superior performance. Project page: https://aigc-explorer.github.io/TIMotion-page/
