Table of Contents
Fetching ...

Metamorphic Testing of Multimodal Human Trajectory Prediction

Helge Spieker, Nadjib Lazaar, Arnaud Gotlieb, Nassim Belmecheri

TL;DR

The paper tackles the challenge of testing multimodal human trajectory prediction (HTP) models in the absence of a ground-truth oracle by applying metamorphic testing (MT). It introduces TrajTest, an MT framework with five metamorphic relations that transform both past trajectories and semantic BEV maps, coupled with probabilistic violation criteria based on $W_2$ and $H_2$, plus a hypothesis-testing criterion for map-altering scenarios. The authors validate TrajTest on the Y-net model using the Stanford Drone Dataset and inD, showing that Wasserstein violations align with ground-truth-based ADE/FDE metrics and that map manipulations reveal robustness and safety-related behavior under contextual changes. Overall, the work demonstrates that MT provides a principled, oracle-free approach to robustness evaluation for autonomous-systems HTP components, enabling systematic fault detection and insights into model biases and invariances.

Abstract

Context: Predicting human trajectories is crucial for the safety and reliability of autonomous systems, such as automated vehicles and mobile robots. However, rigorously testing the underlying multimodal Human Trajectory Prediction (HTP) models, which typically use multiple input sources (e.g., trajectory history and environment maps) and produce stochastic outputs (multiple possible future paths), presents significant challenges. The primary difficulty lies in the absence of a definitive test oracle, as numerous future trajectories might be plausible for any given scenario. Objectives: This research presents the application of Metamorphic Testing (MT) as a systematic methodology for testing multimodal HTP systems. We address the oracle problem through metamorphic relations (MRs) adapted for the complexities and stochastic nature of HTP. Methods: We present five MRs, targeting transformations of both historical trajectory data and semantic segmentation maps used as an environmental context. These MRs encompass: 1) label-preserving geometric transformations (mirroring, rotation, rescaling) applied to both trajectory and map inputs, where outputs are expected to transform correspondingly. 2) Map-altering transformations (changing semantic class labels, introducing obstacles) with predictable changes in trajectory distributions. We propose probabilistic violation criteria based on distance metrics between probability distributions, such as the Wasserstein or Hellinger distance. Conclusion: This study introduces tool, a MT framework for the oracle-less testing of multimodal, stochastic HTP systems. It allows for assessment of model robustness against input transformations and contextual changes without reliance on ground-truth trajectories.

Metamorphic Testing of Multimodal Human Trajectory Prediction

TL;DR

The paper tackles the challenge of testing multimodal human trajectory prediction (HTP) models in the absence of a ground-truth oracle by applying metamorphic testing (MT). It introduces TrajTest, an MT framework with five metamorphic relations that transform both past trajectories and semantic BEV maps, coupled with probabilistic violation criteria based on and , plus a hypothesis-testing criterion for map-altering scenarios. The authors validate TrajTest on the Y-net model using the Stanford Drone Dataset and inD, showing that Wasserstein violations align with ground-truth-based ADE/FDE metrics and that map manipulations reveal robustness and safety-related behavior under contextual changes. Overall, the work demonstrates that MT provides a principled, oracle-free approach to robustness evaluation for autonomous-systems HTP components, enabling systematic fault detection and insights into model biases and invariances.

Abstract

Context: Predicting human trajectories is crucial for the safety and reliability of autonomous systems, such as automated vehicles and mobile robots. However, rigorously testing the underlying multimodal Human Trajectory Prediction (HTP) models, which typically use multiple input sources (e.g., trajectory history and environment maps) and produce stochastic outputs (multiple possible future paths), presents significant challenges. The primary difficulty lies in the absence of a definitive test oracle, as numerous future trajectories might be plausible for any given scenario. Objectives: This research presents the application of Metamorphic Testing (MT) as a systematic methodology for testing multimodal HTP systems. We address the oracle problem through metamorphic relations (MRs) adapted for the complexities and stochastic nature of HTP. Methods: We present five MRs, targeting transformations of both historical trajectory data and semantic segmentation maps used as an environmental context. These MRs encompass: 1) label-preserving geometric transformations (mirroring, rotation, rescaling) applied to both trajectory and map inputs, where outputs are expected to transform correspondingly. 2) Map-altering transformations (changing semantic class labels, introducing obstacles) with predictable changes in trajectory distributions. We propose probabilistic violation criteria based on distance metrics between probability distributions, such as the Wasserstein or Hellinger distance. Conclusion: This study introduces tool, a MT framework for the oracle-less testing of multimodal, stochastic HTP systems. It allows for assessment of model robustness against input transformations and contextual changes without reliance on ground-truth trajectories.

Paper Structure

This paper contains 29 sections, 5 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: TrajTest: Metamorphic Testing for multimodal HTP
  • Figure 2: Inputs and Outputs of Human Trajectory Prediction. Data shows the little_1 scene from the Stanford Drone Dataset robicquet2016learning.
  • Figure 3: Example of $\text{MR}_{ClsChg}$ and $\text{MR}_{Obs}$ applied simultaneously on a segmentation map. The terrain area at the bottom is changed to road and an obstacle is added in the pedestrian's initially predicted path. Data shows the little_1 scene from the Stanford Drone Dataset robicquet2016learning.
  • Figure 4: Best-of-N ADE and FDE: The model generates N plausible future paths (here, N=3) from a probability distribution (blue-shaded background, simplified). In Best-of-N, the path with the minimum error is chosen to calculate ADE/FDE. ADE is the average of the all red lines. FDE is the length of only the thick red line.
  • Figure 5: Agreement of WVC violations and ADE/FDE: Dependency between p-value threshold and classification scores for label-preserving MRs; results are aggregated over $\text{MR}_{Mirror}$, $\text{MR}_{Rot}$, and $\text{MR}_{Scale}$.

Theorems & Definitions (8)

  • definition 1: Metamorphic Relations
  • definition 2: Follow-up Test Case
  • definition 3: Multimodal HTP
  • definition 4: Probabilistic Violation Criterion - PVC
  • definition 5: Wasserstein Distance
  • definition 5: Wasserstein Distance
  • definition 6: Hellinger Distance
  • definition 7: Hypothesis Testing Criterion