Toward Reliable Human Pose Forecasting with Uncertainty

Saeed Saadatnejad; Mehrshad Mirmohammadi; Matin Daghyani; Parham Saremi; Yashar Zoroofchi Benisi; Amirhossein Alimohammadi; Zahra Tehraninasab; Taylor Mordan; Alexandre Alahi

Toward Reliable Human Pose Forecasting with Uncertainty

Saeed Saadatnejad, Mehrshad Mirmohammadi, Matin Daghyani, Parham Saremi, Yashar Zoroofchi Benisi, Amirhossein Alimohammadi, Zahra Tehraninasab, Taylor Mordan, Alexandre Alahi

TL;DR

This work tackles the lack of unified evaluation and uncertainty analysis in human pose forecasting by introducing UnPOSed, an open-source library with standardized datasets and metrics. It proposes two uncertainty paradigms: aleatoric uncertainty is modeled with priors that canalize learning toward shorter horizons, improving short-horizon accuracy by up to 25% without harming longer-horizons; epistemic uncertainty is quantified via a model-agnostic, clustering-based metric (EpU) derived from latent motion representations, enabling reliable out-of-distribution detection. The approach combines an uncertainty-aware loss with time–joint priors and a deep embedded clustering framework to estimate EpU, validated on Human3.6M, AMASS, and 3DPW, showing improved forecasting and better uncertainty estimation. Overall, the work contributes practical uncertainty handling for pose forecasting and a standardized, extensible benchmark to promote uncertainty-aware development in the field.

Abstract

Recently, there has been an arms race of pose forecasting methods aimed at solving the spatio-temporal task of predicting a sequence of future 3D poses of a person given a sequence of past observed ones. However, the lack of unified benchmarks and limited uncertainty analysis have hindered progress in the field. To address this, we first develop an open-source library for human pose forecasting, including multiple models, supporting several datasets, and employing standardized evaluation metrics, with the aim of promoting research and moving toward a unified and consistent evaluation. Second, we devise two types of uncertainty in the problem to increase performance and convey better trust: 1) we propose a method for modeling aleatoric uncertainty by using uncertainty priors to inject knowledge about the pattern of uncertainty. This focuses the capacity of the model in the direction of more meaningful supervision while reducing the number of learned parameters and improving stability; 2) we introduce a novel approach for quantifying the epistemic uncertainty of any model through clustering and measuring the entropy of its assignments. Our experiments demonstrate up to $25\%$ improvements in forecasting at short horizons, with no loss on longer horizons on Human3.6M, AMSS, and 3DPW datasets, and better performance in uncertainty estimation. The code is available online at https://github.com/vita-epfl/UnPOSed.

Toward Reliable Human Pose Forecasting with Uncertainty

TL;DR

Abstract

improvements in forecasting at short horizons, with no loss on longer horizons on Human3.6M, AMSS, and 3DPW datasets, and better performance in uncertainty estimation. The code is available online at https://github.com/vita-epfl/UnPOSed.

Paper Structure (23 sections, 14 equations, 10 figures, 7 tables)

This paper contains 23 sections, 14 equations, 10 figures, 7 tables.

Introduction
Related works
Aleatoric uncertainty in pose forecasting
Epistemic uncertainty in pose forecasting
Determining the number of motion clusters
Deep embedded clustering
Estimating epistemic uncertainty
Experiments
Datasets and Metrics
Baselines
Aleatoric uncertainty
Epistemic uncertainty
Conclusion
Acknowledgment
Appendix
...and 8 more sections

Figures (10)

Figure 1: We propose to model two kinds of uncertainty: 1) Aleatoric uncertainty, highlighting the inherent temporal evolution with lighter colors and thicker bones over time, illustrated by the left person; 2) Epistemic uncertainty, to detect non valid, out-of-distribution forecast poses due to unseen scenarios in training, exemplified by the right person.
Figure 2: The motion is encoded into a well-clustered representation space $Z$ by our LSTM encoder-decoder. The probabilities of the cluster assignments are provided by our deep embedded clustering on that space to estimate the epistemic uncertainty.
Figure 3: ST-Trans consists of 2 MLP layers and 6 Transformer Blocks with skip connections. Each Transformer Block contains two cascaded temporal and spatial transformers to capture the spatio-temporal features of data.
Figure 4: Qualitative forecast poses on Human3.6M h36m depicting different actions over time. For each action, time progresses from left to right. Higher aleatoric uncertainty is shown with a lighter color. Uncertainty of any bone is considered as its outer joint's uncertainty assuming the hip is the body center. We observe that the estimated uncertainty increases over time, with joints farther away from the body center associated with higher uncertainties.
Figure 5: A-MPJPE and its standard deviation in training epochs for 5 trained models. The model with pUAL has a lower standard deviation, meaning a more stable training.
...and 5 more figures

Toward Reliable Human Pose Forecasting with Uncertainty

TL;DR

Abstract

Toward Reliable Human Pose Forecasting with Uncertainty

Authors

TL;DR

Abstract

Table of Contents

Figures (10)