Context-based Interpretable Spatio-Temporal Graph Convolutional Network for Human Motion Forecasting
Edgar Medina, Leyong Loh, Namrata Gurung, Kyung Hun Oh, Niels Heller
TL;DR
This work addresses the challenge of predicting future 3D human poses while also offering interpretable insights into the learned spatio-temporal relationships. The authors introduce CIST-GCN, a context-based, interpretable spatio-temporal graph convolutional network that learns sample-specific adjacency and feature-importance representations through components such as DST-GCN, DAE, GaNet, and ConNet, with an Atrous Pyramid TCN decoder. Across Human3.6M, AMASS, 3DPW, and ExPI, the model achieves competitive or state-of-the-art MPJPE performance and demonstrates robustness to out-of-distribution perturbations, while providing explicit interpretability via feature importance vectors and maps. This combination of accuracy, robustness, and built-in explanations enhances practical applicability in real-world motion understanding and analysis.
Abstract
Human motion prediction is still an open problem extremely important for autonomous driving and safety applications. Due to the complex spatiotemporal relation of motion sequences, this remains a challenging problem not only for movement prediction but also to perform a preliminary interpretation of the joint connections. In this work, we present a Context-based Interpretable Spatio-Temporal Graph Convolutional Network (CIST-GCN), as an efficient 3D human pose forecasting model based on GCNs that encompasses specific layers, aiding model interpretability and providing information that might be useful when analyzing motion distribution and body behavior. Our architecture extracts meaningful information from pose sequences, aggregates displacements and accelerations into the input model, and finally predicts the output displacements. Extensive experiments on Human 3.6M, AMASS, 3DPW, and ExPI datasets demonstrate that CIST-GCN outperforms previous methods in human motion prediction and robustness. Since the idea of enhancing interpretability for motion prediction has its merits, we showcase experiments towards it and provide preliminary evaluations of such insights here. available code: https://github.com/QualityMinds/cistgcn
