Evaluating Human Trajectory Prediction with Metamorphic Testing

Helge Spieker; Nassim Belmecheri; Arnaud Gotlieb; Nadjib Lazaar

Evaluating Human Trajectory Prediction with Metamorphic Testing

Helge Spieker, Nassim Belmecheri, Arnaud Gotlieb, Nadjib Lazaar

TL;DR

This work tackles the robustness evaluation of human trajectory prediction (HTP) models, which produce stochastic, multimodal outputs and lack a single ground-truth trajectory. It introduces metamorphic testing (MT) for HTP, leveraging label-preserving metamorphic relations (MRs) such as mirroring and rescaling of scene segmentation maps, and a novel Wasserstein Violation Criterion (WVC) to statistically detect MR violations without ground-truth labels. The WVC compares trajectory distributions via optimal transport and uses a $z$-test to flag significant differences, enabling fault detection in a label-free manner. Experimental validation on the YNet model with the Stanford Drone Dataset shows that WVC detects violations in line with ground-truth Mean-ADE/Mean-FDE metrics, supporting MT as a practical robustness assessment tool for autonomous-systems–relevant trajectory prediction. This approach offers a scalable, distribution-aware complement to traditional evaluation, with potential to improve reliability in real-world deployments.

Abstract

The prediction of human trajectories is important for planning in autonomous systems that act in the real world, e.g. automated driving or mobile robots. Human trajectory prediction is a noisy process, and no prediction does precisely match any future trajectory. It is therefore approached as a stochastic problem, where the goal is to minimise the error between the true and the predicted trajectory. In this work, we explore the application of metamorphic testing for human trajectory prediction. Metamorphic testing is designed to handle unclear or missing test oracles. It is well-designed for human trajectory prediction, where there is no clear criterion of correct or incorrect human behaviour. Metamorphic relations rely on transformations over source test cases and exploit invariants. A setting well-designed for human trajectory prediction where there are many symmetries of expected human behaviour under variations of the input, e.g. mirroring and rescaling of the input data. We discuss how metamorphic testing can be applied to stochastic human trajectory prediction and introduce the Wasserstein Violation Criterion to statistically assess whether a follow-up test case violates a label-preserving metamorphic relation.

Evaluating Human Trajectory Prediction with Metamorphic Testing

TL;DR

-test to flag significant differences, enabling fault detection in a label-free manner. Experimental validation on the YNet model with the Stanford Drone Dataset shows that WVC detects violations in line with ground-truth Mean-ADE/Mean-FDE metrics, supporting MT as a practical robustness assessment tool for autonomous-systems–relevant trajectory prediction. This approach offers a scalable, distribution-aware complement to traditional evaluation, with potential to improve reliability in real-world deployments.

Abstract

Paper Structure (13 sections, 1 equation, 3 figures, 2 tables, 1 algorithm)

This paper contains 13 sections, 1 equation, 3 figures, 2 tables, 1 algorithm.

Introduction
Background
Metamorphic Testing
Human Trajectory Prediction
Related Work
Metamorphic Testing of Human Trajectory Prediction
Metamorphic Relations
Wasserstein Violation Criterion
Test Process
Experiments
Experimental Setup
Results
Conclusion and Future Work

Figures (3)

Figure 1: Inputs and Outputs of Human Trajectory Prediction. Data shows the little_1 scene from the Stanford Drone Dataset robicquet2016learning.
Figure 2: Classification scores for short-term forecasting.
Figure 3: Classification scores for long-term forecasting.

Theorems & Definitions (5)

definition 1: Metamorphic Relation (MR)
definition 2: Follow-up test cases
definition 3
definition 4: MR1: Mirroring
definition 5: MR2: Rescale

Evaluating Human Trajectory Prediction with Metamorphic Testing

TL;DR

Abstract

Evaluating Human Trajectory Prediction with Metamorphic Testing

Authors

TL;DR

Abstract

Table of Contents

Figures (3)

Theorems & Definitions (5)