Towards more realistic human motion prediction with attention to motion coordination
Pengxiang Ding, Jianqin Yin
TL;DR
This work tackles the realism gap in human motion prediction by explicitly modeling global motion coordination alongside local joint interactions. It introduces Coordination Attractor–based Comprehensive Joint Relation Extractor (CJRE) and Multi-timescale Dynamics Extractor (MTDE), which together capture global joint coordination and enriched intra-joint dynamics. The Global Coordination Extractor (GCE) and Local Interaction Extractor (LIE) within CJRE, plus the Adaptive Feature Fusing Module (AFFM), enable simultaneous consideration of global and local relations, improving MPJPE across H3.6M, CMU-Mocap, and 3DPW for both short- and long-term horizons. Experimental results, including ablations and qualitative visualizations, demonstrate that the proposed framework produces more realistic, coordinated motions with practical impact for robotics, animation, and perception systems.
Abstract
Joint relation modeling is a curial component in human motion prediction. Most existing methods rely on skeletal-based graphs to build the joint relations, where local interactive relations between joint pairs are well learned. However, the motion coordination, a global joint relation reflecting the simultaneous cooperation of all joints, is usually weakened because it is learned from part to whole progressively and asynchronously. Thus, the final predicted motions usually appear unrealistic. To tackle this issue, we learn a medium, called coordination attractor (CA), from the spatiotemporal features of motion to characterize the global motion features, which is subsequently used to build new relative joint relations. Through the CA, all joints are related simultaneously, and thus the motion coordination of all joints can be better learned. Based on this, we further propose a novel joint relation modeling module, Comprehensive Joint Relation Extractor (CJRE), to combine this motion coordination with the local interactions between joint pairs in a unified manner. Additionally, we also present a Multi-timescale Dynamics Extractor (MTDE) to extract enriched dynamics from the raw position information for effective prediction. Extensive experiments show that the proposed framework outperforms state-of-the-art methods in both short- and long-term predictions on H3.6M, CMU-Mocap, and 3DPW.
