Table of Contents
Fetching ...

Towards Scalable Probabilistic Human Motion Prediction with Gaussian Processes for Safe Human-Robot Collaboration

Jinger Chong, Xiaotong Zhang, Kamal Youcef-Toumi

TL;DR

The results demonstrate that scalable GP-based models can deliver competitive accuracy together with reliable and interpretable uncertainty estimates for downstream robotics tasks such as motion planning and collision avoidance.

Abstract

Accurate human motion prediction with well-calibrated uncertainty is critical for safe human-robot collaboration (HRC), where robots must anticipate and react to human movements in real time. We propose a structured multitask variational Gaussian Process (GP) framework for full-body human motion prediction that captures temporal correlations and leverages joint-dimension-level factorization for scalability, while using a continuous 6D rotation representation to preserve kinematic consistency. Evaluated on Human3.6M (H3.6M), our model achieves up to 50 lower kernel density estimate negative log-likelihood (KDE NLL) than strong baselines, a mean continuous ranked probability score (CRPS) of 0.021 m, and deterministic mean angle error (MAE) that is 3-18% higher than competitive deep learning methods. Empirical coverage analysis shows that the fraction of ground-truth outcomes contained within predicted confidence intervals gradually decreases with horizon, remaining conservative for lower-confidence intervals and near-nominal for higher-confidence intervals, with only modest calibration drift at longer horizons. Despite its probabilistic formulation, our model requires only 0.24-0.35 M parameters, roughly eight times fewer than comparable approaches, and exhibits modest inference times, indicating suitability for real-time deployment. Extensive ablation studies further validated the choice of 6D rotation representation and Matern 3/2 + Linear kernel, and guided the selection of the number of inducing points and latent dimensionality. These results demonstrate that scalable GP-based models can deliver competitive accuracy together with reliable and interpretable uncertainty estimates for downstream robotics tasks such as motion planning and collision avoidance.

Towards Scalable Probabilistic Human Motion Prediction with Gaussian Processes for Safe Human-Robot Collaboration

TL;DR

The results demonstrate that scalable GP-based models can deliver competitive accuracy together with reliable and interpretable uncertainty estimates for downstream robotics tasks such as motion planning and collision avoidance.

Abstract

Accurate human motion prediction with well-calibrated uncertainty is critical for safe human-robot collaboration (HRC), where robots must anticipate and react to human movements in real time. We propose a structured multitask variational Gaussian Process (GP) framework for full-body human motion prediction that captures temporal correlations and leverages joint-dimension-level factorization for scalability, while using a continuous 6D rotation representation to preserve kinematic consistency. Evaluated on Human3.6M (H3.6M), our model achieves up to 50 lower kernel density estimate negative log-likelihood (KDE NLL) than strong baselines, a mean continuous ranked probability score (CRPS) of 0.021 m, and deterministic mean angle error (MAE) that is 3-18% higher than competitive deep learning methods. Empirical coverage analysis shows that the fraction of ground-truth outcomes contained within predicted confidence intervals gradually decreases with horizon, remaining conservative for lower-confidence intervals and near-nominal for higher-confidence intervals, with only modest calibration drift at longer horizons. Despite its probabilistic formulation, our model requires only 0.24-0.35 M parameters, roughly eight times fewer than comparable approaches, and exhibits modest inference times, indicating suitability for real-time deployment. Extensive ablation studies further validated the choice of 6D rotation representation and Matern 3/2 + Linear kernel, and guided the selection of the number of inducing points and latent dimensionality. These results demonstrate that scalable GP-based models can deliver competitive accuracy together with reliable and interpretable uncertainty estimates for downstream robotics tasks such as motion planning and collision avoidance.
Paper Structure (23 sections, 5 equations, 6 figures, 5 tables)

This paper contains 23 sections, 5 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Model architecture illustrated for a single joint with $D$ dimensions. Each joint–dimension pair is modeled by a GP that maps $H$ past time steps to $F$ future time steps and produces a Gaussian predictive distribution at each future time step. This structure is replicated for all joints, resulting in 96 parallel GPs after preprocessing.
  • Figure 2: KDE NLL for different rotation representations evaluated in our ablation study. Exponential map and quaternion representations show similar performance, while 6D rotation consistently achieves the lowest KDE NLL, most evidently at shorter horizons.
  • Figure 3: KDE NLL of our final model compared to Motron and DLow. Our model consistently achieves lower KDE NLL across all time steps, indicating stronger probabilistic performance.
  • Figure 4: Empirical coverage of predicted joint positions at 50%, 80%, and 95% confidence intervals across prediction horizons. The 50% interval is highly conservative at short horizons and gradually decreases while remaining above nominal at longer horizons. The 80% interval shows slight overestimation early on and stabilizes over time, while the 95% interval remains close to nominal throughout. Overall, the model exhibits modest calibration drift with horizon, maintaining reliable uncertainty estimates.
  • Figure 5: Visualization of 50 sampled skeleton predictions drawn from the predicted distributions at different horizons. The solid skeleton denotes the ground-truth motion, while the translucent skeletons represent sampled predictions. Example shown for Subject S9 performing the walking action (first sequence). Colors indicate body sides (blue: right, orange: left) for visual clarity. The increasing spread of samples at longer horizons reflects the growth of predictive uncertainty over time.
  • ...and 1 more figures