Surprise Potential as a Measure of Interactivity in Driving Scenarios
Wenhao Ding, Sushant Veer, Karen Leung, Yulong Cao, Marco Pavone
TL;DR
The paper tackles the paucity of interactive driving data by proposing surprise potential SP as a metric derived from distribution shifts in multi agent trajectory predictions under counterfactual interventions. SP is defined as S(ξ) = $D(\mathcal{F}(ξ), \mathcal{F}\circ\mathcal{G}(ξ))$, decomposed into counterfactual generation G, future predictor F, and shift metric D, and evaluated via human preferences on nuScenes. An exhaustive design space exploration identifies Hist-prim with Wasserstein distance on scene and query centric predictors as the strongest performer, achieving a correlation above $0.82$ with human rewards. Downstream, curated interactive datasets improve planner safety metrics and can enhance learning through importance weighted upsampling, suggesting practical value for benchmarking and training in multi agent autonomous driving contexts.
Abstract
Validating the safety and performance of an autonomous vehicle (AV) requires benchmarking on real-world driving logs. However, typical driving logs contain mostly uneventful scenarios with minimal interactions between road users. Identifying interactive scenarios in real-world driving logs enables the curation of datasets that amplify critical signals and provide a more accurate assessment of an AV's performance. In this paper, we present a novel metric that identifies interactive scenarios by measuring an AV's surprise potential on others. First, we identify three dimensions of the design space to describe a family of surprise potential measures. Second, we exhaustively evaluate and compare different instantiations of the surprise potential measure within this design space on the nuScenes dataset. To determine how well a surprise potential measure correctly identifies an interactive scenario, we use a reward model learned from human preferences to assess alignment with human intuition. Our proposed surprise potential, arising from this exhaustive comparative study, achieves a correlation of more than 0.82 with the human-aligned reward function, outperforming existing approaches. Lastly, we validate motion planners on curated interactive scenarios to demonstrate downstream applications.
