EMPERROR: A Flexible Generative Perception Error Model for Probing Self-Driving Planners
Niklas Hanselmann, Simon Doll, Marius Cordts, Hendrik P. A. Lensch, Andreas Geiger
TL;DR
The paper addresses the gap in evaluating self-driving planners under realistic perception failures by introducing EMPERROR, a transformer-based conditional CVAE PEM that models $P(\hat{\mathbf{B}} \mid \mathbf{S}, \mathcal{M})$ with per-object latent variables $\mathbf{Z}$ to generate diverse, scene-consistent detections. It integrates this PEM into a framework for probing an imitation-learning planner, proposing an adversarial optimization over $\mathbf{Z}$ to maximize a collision-focused cost while keeping samples plausible via a prior regularization term. Key contributions include (1) a faithful, scene-aware PEM that captures false positives/negatives and inter-object error correlations, (2) a framework for stress-testing planners with realistic perception noise, and (3) empirical findings that the planner’s safety can degrade significantly under plausible detection errors. The work offers a practical tool for data-driven evaluation of self-driving systems, enabling more thorough assessments of planning robustness and informing safer system design.
Abstract
To handle the complexities of real-world traffic, learning planners for self-driving from data is a promising direction. While recent approaches have shown great progress, they typically assume a setting in which the ground-truth world state is available as input. However, when deployed, planning needs to be robust to the long-tail of errors incurred by a noisy perception system, which is often neglected in evaluation. To address this, previous work has proposed drawing adversarial samples from a perception error model (PEM) mimicking the noise characteristics of a target object detector. However, these methods use simple PEMs that fail to accurately capture all failure modes of detection. In this paper, we present EMPERROR, a novel transformer-based generative PEM, apply it to stress-test an imitation learning (IL)-based planner and show that it imitates modern detectors more faithfully than previous work. Furthermore, it is able to produce realistic noisy inputs that increase the planner's collision rate by up to 85%, demonstrating its utility as a valuable tool for a more complete evaluation of self-driving planners.
