Table of Contents
Fetching ...

Spatio-Temporal Multi-Subgraph GCN for 3D Human Motion Prediction

Jiexin Wang, Yiju Guo, Bing Su

TL;DR

This work tackles 3D skeleton-based human motion prediction by explicitly decoupling temporal and spatial information into orthogonal branches and enforcing cross-domain consistency via a spatio-temporal information constraint $\mathcal{L}_{ST}$. It introduces the Spatial-Temporal Multi-Subgraph Graph Convolutional Network (STMS-GCN), which also employs multiple trainable subgraph kernels with a homogeneous information constraint to capture rich motion patterns. The model predicts future poses by combining a temporal decoder $G_T$ and a spatial decoder $G_S$, with the final output taken from the spatial branch as $\widetilde{\mathbf{Y}}=\mathbf{Y}^{S,L}$, and is trained using a composite loss that includes $\mathcal{L}_1$, $\mathcal{L}_{ST}$, $\mathcal{L}_{con}^{S}$, and $\mathcal{L}_{con}^{T}$. Empirical results on Human3.6M and CMU Mocap show state-of-the-art MPJPE across multiple horizons, with ablations confirming the benefits of orthogonal branches, cross-domain interaction, and multi-subgraph learning.

Abstract

Human motion prediction (HMP) involves forecasting future human motion based on historical data. Graph Convolutional Networks (GCNs) have garnered widespread attention in this field for their proficiency in capturing relationships among joints in human motion. However, existing GCN-based methods tend to focus on either temporal-domain or spatial-domain features, or they combine spatio-temporal features without fully leveraging the complementarity and cross-dependency of these two features. In this paper, we propose the Spatial-Temporal Multi-Subgraph Graph Convolutional Network (STMS-GCN) to capture complex spatio-temporal dependencies in human motion. Specifically, we decouple the modeling of temporal and spatial dependencies, enabling cross-domain knowledge transfer at multiple scales through a spatio-temporal information consistency constraint mechanism. Besides, we utilize multiple subgraphs to extract richer motion information and enhance the learning associations of diverse subgraphs through a homogeneous information constraint mechanism. Extensive experiments on the standard HMP benchmarks demonstrate the superiority of our method.

Spatio-Temporal Multi-Subgraph GCN for 3D Human Motion Prediction

TL;DR

This work tackles 3D skeleton-based human motion prediction by explicitly decoupling temporal and spatial information into orthogonal branches and enforcing cross-domain consistency via a spatio-temporal information constraint . It introduces the Spatial-Temporal Multi-Subgraph Graph Convolutional Network (STMS-GCN), which also employs multiple trainable subgraph kernels with a homogeneous information constraint to capture rich motion patterns. The model predicts future poses by combining a temporal decoder and a spatial decoder , with the final output taken from the spatial branch as , and is trained using a composite loss that includes , , , and . Empirical results on Human3.6M and CMU Mocap show state-of-the-art MPJPE across multiple horizons, with ablations confirming the benefits of orthogonal branches, cross-domain interaction, and multi-subgraph learning.

Abstract

Human motion prediction (HMP) involves forecasting future human motion based on historical data. Graph Convolutional Networks (GCNs) have garnered widespread attention in this field for their proficiency in capturing relationships among joints in human motion. However, existing GCN-based methods tend to focus on either temporal-domain or spatial-domain features, or they combine spatio-temporal features without fully leveraging the complementarity and cross-dependency of these two features. In this paper, we propose the Spatial-Temporal Multi-Subgraph Graph Convolutional Network (STMS-GCN) to capture complex spatio-temporal dependencies in human motion. Specifically, we decouple the modeling of temporal and spatial dependencies, enabling cross-domain knowledge transfer at multiple scales through a spatio-temporal information consistency constraint mechanism. Besides, we utilize multiple subgraphs to extract richer motion information and enhance the learning associations of diverse subgraphs through a homogeneous information constraint mechanism. Extensive experiments on the standard HMP benchmarks demonstrate the superiority of our method.
Paper Structure (11 sections, 12 equations, 4 figures, 5 tables)

This paper contains 11 sections, 12 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Comparison of prediction methods. (a): Individual modeling of spatial or temporal dependencies. (b): Mixing spatio-temporal dependencies modeling. (c): Decoupling motion data modeling into temporal and spatial domains, fusing features for final prediction. (d): Ours leverages temporal-domain learning to assist the learning of the spatial domain, distilling the learned cross-domain knowledge into interactions across multiple scales (red dashed lines).
  • Figure 2: Illustration of STMS-GCN.
  • Figure 3: Comparison of the predictive performance.
  • Figure 4: Dfferent consistency constraints in the multi-subgraph learning are applied to weight parameters "W" or adjacency matrices "A".