SafeShift: Safety-Informed Distribution Shifts for Robust Trajectory Prediction in Autonomous Driving

Benjamin Stoler; Ingrid Navarro; Meghdeep Jana; Soonmin Hwang; Jonathan Francis; Jean Oh

SafeShift: Safety-Informed Distribution Shifts for Robust Trajectory Prediction in Autonomous Driving

Benjamin Stoler, Ingrid Navarro, Meghdeep Jana, Soonmin Hwang, Jonathan Francis, Jean Oh

TL;DR

SafeShift tackles the lack of safety-critical scenarios in real-world autonomous driving data by introducing a safety-informed distribution-shift framework built on hierarchical scenario scoring and counterfactual probes. It characterizes scenarios with two feature sets (Individual and Social), aggregates them into TrajScore and SceneScore, and augments this with counterfactual future extrapolation to capture hidden risks. The approach demonstrates that safety-informed shifts significantly raise collision rates on standard predictors, and provides a remediation strategy that reduces test-time collisions by about $10\%$ on average, with model-specific gains. Overall, SafeShift enables robust evaluation and targeted improvement of trajectory predictors without resorting to dangerous on-road testing or imperfect simulations, and releases open-source code for reproducibility and further research.

Abstract

As autonomous driving technology matures, safety and robustness of its key components, including trajectory prediction, is vital. Though real-world datasets, such as Waymo Open Motion, provide realistic recorded scenarios for model development, they often lack truly safety-critical situations. Rather than utilizing unrealistic simulation or dangerous real-world testing, we instead propose a framework to characterize such datasets and find hidden safety-relevant scenarios within. Our approach expands the spectrum of safety-relevance, allowing us to study trajectory prediction models under a safety-informed, distribution shift setting. We contribute a generalized scenario characterization method, a novel scoring scheme to find subtly-avoided risky scenarios, and an evaluation of trajectory prediction models in this setting. We further contribute a remediation strategy, achieving a 10% average reduction in prediction collision rates. To facilitate future research, we release our code to the public: github.com/cmubig/SafeShift

SafeShift: Safety-Informed Distribution Shifts for Robust Trajectory Prediction in Autonomous Driving

TL;DR

on average, with model-specific gains. Overall, SafeShift enables robust evaluation and targeted improvement of trajectory predictors without resorting to dangerous on-road testing or imperfect simulations, and releases open-source code for reproducibility and further research.

Abstract

Paper Structure (19 sections, 4 equations, 4 figures, 5 tables)

This paper contains 19 sections, 4 equations, 4 figures, 5 tables.

Introduction
Related Work
Socially-Aware Trajectory Prediction
Robustness Assessment in Trajectory Prediction
Critical Scenario Identification in Autonomous Driving
Preliminaries
Scenario Features
Scenario Scoring
Scoring Functions
Counterfactual Re-Scoring
Downstream Tasks
Distribution Shift Creation
Robust Trajectory Prediction
Experimental Setup
Results
...and 4 more sections

Figures (4)

Figure 1: Pearson correlation coefficients for each pair of metrics, showing how the features complement each other. Analysis performed on WOMD ettinger2021large.
Figure 2: PDF of our score variations, exhibiting long-tailed behavior. Analysis performed in WOMD ettinger2021large.
Figure 3: Examples of WOMD shi2022motion scenes by score. In-Distribution and Out-of-Distribution follow our Scoring split in \ref{['ssec:distribution_shift_creation']}.
Figure 4: Qualitative examples of remediation approaches applied to MTR across two distinct scenarios. Trajectories progress from the pink starting points.

SafeShift: Safety-Informed Distribution Shifts for Robust Trajectory Prediction in Autonomous Driving

TL;DR

Abstract

SafeShift: Safety-Informed Distribution Shifts for Robust Trajectory Prediction in Autonomous Driving

Authors

TL;DR

Abstract

Table of Contents

Figures (4)