MMCM: Multimodality-aware Metric using Clustering-based Modes for Probabilistic Human Motion Prediction
Kyotaro Tokoro, Hiromu Taketsugu, Norimichi Ukita
TL;DR
MMCM addresses the ill-posed nature of human motion prediction by introducing a clustering-based, multimodality-aware evaluation metric. It defines motion modes via an autoencoder–UMAP dimen-sionality reduction followed by HDBSCAN clustering, and uses MMGTs to identify valid modes for a given past. The metric computes Mode Coverage Rate $C$ and Mode Validity Rate $V$, combining them as the harmonic mean to form MMCM, thereby rewarding predictions that cover all plausible modes while staying within valid ones. Experiments on Human3.6M and AMASS demonstrate that MMCM provides a more principled assessment of multimodal predictions than existing diversity metrics, with robustness to abnormal sequences and reasonable computational efficiency. This approach can guide the development of probabilistic HMP models by highlighting both breadth and correctness of multimodal predictions.
Abstract
This paper proposes a novel metric for Human Motion Prediction (HMP). Since a single past sequence can lead to multiple possible futures, a probabilistic HMP method predicts such multiple motions. While a single motion predicted by a deterministic method is evaluated only with the difference from its ground truth motion, multiple predicted motions should also be evaluated based on their distribution. For this evaluation, this paper focuses on the following two criteria. \textbf{(a) Coverage}: motions should be distributed among multiple motion modes to cover diverse possibilities. \textbf{(b) Validity}: motions should be kinematically valid as future motions observable from a given past motion. However, existing metrics simply appreciate widely distributed motions even if these motions are observed in a single mode and kinematically invalid. To resolve these disadvantages, this paper proposes a Multimodality-aware Metric using Clustering-based Modes (MMCM). For (a) coverage, MMCM divides a motion space into several clusters, each of which is regarded as a mode. These modes are used to explicitly evaluate whether predicted motions are distributed among multiple modes. For (b) validity, MMCM identifies valid modes by collecting possible future motions from a motion dataset. Our experiments validate that our clustering yields sensible mode definitions and that MMCM accurately scores multimodal predictions. Code: https://github.com/placerkyo/MMCM
