Spatio-Temporal Multi-Subgraph GCN for 3D Human Motion Prediction
Jiexin Wang, Yiju Guo, Bing Su
TL;DR
This work tackles 3D skeleton-based human motion prediction by explicitly decoupling temporal and spatial information into orthogonal branches and enforcing cross-domain consistency via a spatio-temporal information constraint $\mathcal{L}_{ST}$. It introduces the Spatial-Temporal Multi-Subgraph Graph Convolutional Network (STMS-GCN), which also employs multiple trainable subgraph kernels with a homogeneous information constraint to capture rich motion patterns. The model predicts future poses by combining a temporal decoder $G_T$ and a spatial decoder $G_S$, with the final output taken from the spatial branch as $\widetilde{\mathbf{Y}}=\mathbf{Y}^{S,L}$, and is trained using a composite loss that includes $\mathcal{L}_1$, $\mathcal{L}_{ST}$, $\mathcal{L}_{con}^{S}$, and $\mathcal{L}_{con}^{T}$. Empirical results on Human3.6M and CMU Mocap show state-of-the-art MPJPE across multiple horizons, with ablations confirming the benefits of orthogonal branches, cross-domain interaction, and multi-subgraph learning.
Abstract
Human motion prediction (HMP) involves forecasting future human motion based on historical data. Graph Convolutional Networks (GCNs) have garnered widespread attention in this field for their proficiency in capturing relationships among joints in human motion. However, existing GCN-based methods tend to focus on either temporal-domain or spatial-domain features, or they combine spatio-temporal features without fully leveraging the complementarity and cross-dependency of these two features. In this paper, we propose the Spatial-Temporal Multi-Subgraph Graph Convolutional Network (STMS-GCN) to capture complex spatio-temporal dependencies in human motion. Specifically, we decouple the modeling of temporal and spatial dependencies, enabling cross-domain knowledge transfer at multiple scales through a spatio-temporal information consistency constraint mechanism. Besides, we utilize multiple subgraphs to extract richer motion information and enhance the learning associations of diverse subgraphs through a homogeneous information constraint mechanism. Extensive experiments on the standard HMP benchmarks demonstrate the superiority of our method.
