Table of Contents
Fetching ...

MMCM: Multimodality-aware Metric using Clustering-based Modes for Probabilistic Human Motion Prediction

Kyotaro Tokoro, Hiromu Taketsugu, Norimichi Ukita

TL;DR

MMCM addresses the ill-posed nature of human motion prediction by introducing a clustering-based, multimodality-aware evaluation metric. It defines motion modes via an autoencoder–UMAP dimen-sionality reduction followed by HDBSCAN clustering, and uses MMGTs to identify valid modes for a given past. The metric computes Mode Coverage Rate $C$ and Mode Validity Rate $V$, combining them as the harmonic mean to form MMCM, thereby rewarding predictions that cover all plausible modes while staying within valid ones. Experiments on Human3.6M and AMASS demonstrate that MMCM provides a more principled assessment of multimodal predictions than existing diversity metrics, with robustness to abnormal sequences and reasonable computational efficiency. This approach can guide the development of probabilistic HMP models by highlighting both breadth and correctness of multimodal predictions.

Abstract

This paper proposes a novel metric for Human Motion Prediction (HMP). Since a single past sequence can lead to multiple possible futures, a probabilistic HMP method predicts such multiple motions. While a single motion predicted by a deterministic method is evaluated only with the difference from its ground truth motion, multiple predicted motions should also be evaluated based on their distribution. For this evaluation, this paper focuses on the following two criteria. \textbf{(a) Coverage}: motions should be distributed among multiple motion modes to cover diverse possibilities. \textbf{(b) Validity}: motions should be kinematically valid as future motions observable from a given past motion. However, existing metrics simply appreciate widely distributed motions even if these motions are observed in a single mode and kinematically invalid. To resolve these disadvantages, this paper proposes a Multimodality-aware Metric using Clustering-based Modes (MMCM). For (a) coverage, MMCM divides a motion space into several clusters, each of which is regarded as a mode. These modes are used to explicitly evaluate whether predicted motions are distributed among multiple modes. For (b) validity, MMCM identifies valid modes by collecting possible future motions from a motion dataset. Our experiments validate that our clustering yields sensible mode definitions and that MMCM accurately scores multimodal predictions. Code: https://github.com/placerkyo/MMCM

MMCM: Multimodality-aware Metric using Clustering-based Modes for Probabilistic Human Motion Prediction

TL;DR

MMCM addresses the ill-posed nature of human motion prediction by introducing a clustering-based, multimodality-aware evaluation metric. It defines motion modes via an autoencoder–UMAP dimen-sionality reduction followed by HDBSCAN clustering, and uses MMGTs to identify valid modes for a given past. The metric computes Mode Coverage Rate and Mode Validity Rate , combining them as the harmonic mean to form MMCM, thereby rewarding predictions that cover all plausible modes while staying within valid ones. Experiments on Human3.6M and AMASS demonstrate that MMCM provides a more principled assessment of multimodal predictions than existing diversity metrics, with robustness to abnormal sequences and reasonable computational efficiency. This approach can guide the development of probabilistic HMP models by highlighting both breadth and correctness of multimodal predictions.

Abstract

This paper proposes a novel metric for Human Motion Prediction (HMP). Since a single past sequence can lead to multiple possible futures, a probabilistic HMP method predicts such multiple motions. While a single motion predicted by a deterministic method is evaluated only with the difference from its ground truth motion, multiple predicted motions should also be evaluated based on their distribution. For this evaluation, this paper focuses on the following two criteria. \textbf{(a) Coverage}: motions should be distributed among multiple motion modes to cover diverse possibilities. \textbf{(b) Validity}: motions should be kinematically valid as future motions observable from a given past motion. However, existing metrics simply appreciate widely distributed motions even if these motions are observed in a single mode and kinematically invalid. To resolve these disadvantages, this paper proposes a Multimodality-aware Metric using Clustering-based Modes (MMCM). For (a) coverage, MMCM divides a motion space into several clusters, each of which is regarded as a mode. These modes are used to explicitly evaluate whether predicted motions are distributed among multiple modes. For (b) validity, MMCM identifies valid modes by collecting possible future motions from a motion dataset. Our experiments validate that our clustering yields sensible mode definitions and that MMCM accurately scores multimodal predictions. Code: https://github.com/placerkyo/MMCM

Paper Structure

This paper contains 22 sections, 3 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Why a new multimodality metric is needed. Three prediction patterns are sketched in latent "motion mode space" (a motion mode is a set of similar actions): (i) Unimodal (all samples in a few valid modes. Valid mode is a mode that is valid as a continuations of the past), (ii) Spread-out (samples dispersed without regard to the valid modes), and (iii) Multimodal (samples distributed over several valid modes). The table at the bottom visualizes which prediction pattern each metric scores high.
  • Figure 2: A large motion dataset is first passed through the Dimensionality Reduction. On this n-dimensional plane, clustering detects high-density regions. Each resulting cluster is recorded as a motion mode (Mode 1, Mode 2, Mode 3, $\cdots$).
  • Figure 3: Evaluation by MMCM. Both the predictions (top) and the MMGTs (bottom, denoted as $\mathbf{Y}^{k}_{\mathrm{mm}}$) are independently passed through a dimensionality reduction to obtain their latent embedding. Each embedded sequence is assigned to its nearest mode obtained from the motion space clustering. Among all the modes, those including MMGT embeddings are regarded as valid modes. MMCM is defined as the harmonic mean of $C$ and $V$, as shown on the right side of the figure.
  • Figure 4: Multimodality comparison between HumanMAC and CoMusion. The left and right columns show motions predicted by HumanMAC and CoMusion, respectively. In each of the smoking and phoning examples, the top row overlays all 50 predictions, while the bottom row extracts one representative motion per valid mode detected by MMCM.
  • Figure 5: Outputs from DLow classified as abnormal by MMCM. Top: an abnormal motion in which the arms become unrealistically elongated. Bottom: an abnormal motion where the torso is horizontally mirrored from right to left.
  • ...and 5 more figures