Table of Contents
Fetching ...

Beyond ADE and FDE: A Comprehensive Evaluation Framework for Safety-Critical Prediction in Multi-Agent Autonomous Driving Scenarios

Feifei Liu, Haozhe Wang, Zejun Wei, Qirong Lu, Yiyang Wen, Xiaoyu Tang, Jingyan Jiang, Zhijian He

TL;DR

The paper addresses the insufficiency of ADE/FDE in capturing safety-critical and interactive dynamics in autonomous driving. It introduces a three-layer evaluation framework operating over semantic information, agent density, and road geometry, quantified by the Map Information Effectiveness metric $MIE = \frac{\text{Error}_{o} - \text{Error}_{w}}{\sqrt{\text{Error}_{o}}}$, to test predictions under map-free and map-rich conditions. Using nuScenes and AgentFormer as a baseline, the experiments reveal pronounced map dependency and safety-critical failure modes that traditional metrics overlook, especially in high-density and curved-road scenarios. These results establish scenario-aware validation as essential for developing robust, certifiable trajectory predictors for autonomous vehicles.

Abstract

Current evaluation methods for autonomous driving prediction models rely heavily on simplistic metrics such as Average Displacement Error (ADE) and Final Displacement Error (FDE). While these metrics offer basic performance assessments, they fail to capture the nuanced behavior of prediction modules under complex, interactive, and safety-critical driving scenarios. For instance, existing benchmarks do not distinguish the influence of nearby versus distant agents, nor systematically test model robustness across varying multi-agent interactions. This paper addresses this critical gap by proposing a novel testing framework that evaluates prediction performance under diverse scene structures, saying, map context, agent density and spatial distribution. Through extensive empirical analysis, we quantify the differential impact of agent proximity on target trajectory prediction and identify scenario-specific failure cases that are not exposed by traditional metrics. Our findings highlight key vulnerabilities in current state-of-the-art prediction models and demonstrate the importance of scenario-aware evaluation. The proposed framework lays the groundwork for rigorous, safety-driven prediction validation, contributing significantly to the identification of failure-prone corner cases and the development of robust, certifiable prediction systems for autonomous vehicles.

Beyond ADE and FDE: A Comprehensive Evaluation Framework for Safety-Critical Prediction in Multi-Agent Autonomous Driving Scenarios

TL;DR

The paper addresses the insufficiency of ADE/FDE in capturing safety-critical and interactive dynamics in autonomous driving. It introduces a three-layer evaluation framework operating over semantic information, agent density, and road geometry, quantified by the Map Information Effectiveness metric , to test predictions under map-free and map-rich conditions. Using nuScenes and AgentFormer as a baseline, the experiments reveal pronounced map dependency and safety-critical failure modes that traditional metrics overlook, especially in high-density and curved-road scenarios. These results establish scenario-aware validation as essential for developing robust, certifiable trajectory predictors for autonomous vehicles.

Abstract

Current evaluation methods for autonomous driving prediction models rely heavily on simplistic metrics such as Average Displacement Error (ADE) and Final Displacement Error (FDE). While these metrics offer basic performance assessments, they fail to capture the nuanced behavior of prediction modules under complex, interactive, and safety-critical driving scenarios. For instance, existing benchmarks do not distinguish the influence of nearby versus distant agents, nor systematically test model robustness across varying multi-agent interactions. This paper addresses this critical gap by proposing a novel testing framework that evaluates prediction performance under diverse scene structures, saying, map context, agent density and spatial distribution. Through extensive empirical analysis, we quantify the differential impact of agent proximity on target trajectory prediction and identify scenario-specific failure cases that are not exposed by traditional metrics. Our findings highlight key vulnerabilities in current state-of-the-art prediction models and demonstrate the importance of scenario-aware evaluation. The proposed framework lays the groundwork for rigorous, safety-driven prediction validation, contributing significantly to the identification of failure-prone corner cases and the development of robust, certifiable prediction systems for autonomous vehicles.

Paper Structure

This paper contains 15 sections, 3 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Overview of the proposed three-layer safety-critical evaluation framework for trajectory prediction models. The framework consists of layer 1(a) evaluating model performance with and without semantic maps using the proposed MIE metric, layer 2(b) assessing performance across four density levels, and layer 2(c) evaluating performance on straight versus curved road scenarios. Only models passing all three filters are certified as safety-critical for autonomous driving applications.
  • Figure 2: Visualization of Trajectory Samples with and without Maps. (a) with map scenario at an intersection. (b) without map scenario at an intersection.
  • Figure 3: Visualization of Trajectory Samples with different agents numbers on nuScenes. (a) Few agents scenario. (b) Many agents scenario.
  • Figure 4: Visualization of Trajectory Samples with Spatial distribution classification on nuScenes. (a) Scenario with curve road. (b) Scenario with straight road.
  • Figure 5: Visualization of Trajectory Samples with Large Errors on nuScenes. (a) With semantic map: Despite having map information, the model still generates erroneous predictions leading to collisions. (b) Without semantic map: The absence of map information exacerbates the prediction errors, resulting in multiple collision scenarios.