Table of Contents
Fetching ...

EMPERROR: A Flexible Generative Perception Error Model for Probing Self-Driving Planners

Niklas Hanselmann, Simon Doll, Marius Cordts, Hendrik P. A. Lensch, Andreas Geiger

TL;DR

The paper addresses the gap in evaluating self-driving planners under realistic perception failures by introducing EMPERROR, a transformer-based conditional CVAE PEM that models $P(\hat{\mathbf{B}} \mid \mathbf{S}, \mathcal{M})$ with per-object latent variables $\mathbf{Z}$ to generate diverse, scene-consistent detections. It integrates this PEM into a framework for probing an imitation-learning planner, proposing an adversarial optimization over $\mathbf{Z}$ to maximize a collision-focused cost while keeping samples plausible via a prior regularization term. Key contributions include (1) a faithful, scene-aware PEM that captures false positives/negatives and inter-object error correlations, (2) a framework for stress-testing planners with realistic perception noise, and (3) empirical findings that the planner’s safety can degrade significantly under plausible detection errors. The work offers a practical tool for data-driven evaluation of self-driving systems, enabling more thorough assessments of planning robustness and informing safer system design.

Abstract

To handle the complexities of real-world traffic, learning planners for self-driving from data is a promising direction. While recent approaches have shown great progress, they typically assume a setting in which the ground-truth world state is available as input. However, when deployed, planning needs to be robust to the long-tail of errors incurred by a noisy perception system, which is often neglected in evaluation. To address this, previous work has proposed drawing adversarial samples from a perception error model (PEM) mimicking the noise characteristics of a target object detector. However, these methods use simple PEMs that fail to accurately capture all failure modes of detection. In this paper, we present EMPERROR, a novel transformer-based generative PEM, apply it to stress-test an imitation learning (IL)-based planner and show that it imitates modern detectors more faithfully than previous work. Furthermore, it is able to produce realistic noisy inputs that increase the planner's collision rate by up to 85%, demonstrating its utility as a valuable tool for a more complete evaluation of self-driving planners.

EMPERROR: A Flexible Generative Perception Error Model for Probing Self-Driving Planners

TL;DR

The paper addresses the gap in evaluating self-driving planners under realistic perception failures by introducing EMPERROR, a transformer-based conditional CVAE PEM that models with per-object latent variables to generate diverse, scene-consistent detections. It integrates this PEM into a framework for probing an imitation-learning planner, proposing an adversarial optimization over to maximize a collision-focused cost while keeping samples plausible via a prior regularization term. Key contributions include (1) a faithful, scene-aware PEM that captures false positives/negatives and inter-object error correlations, (2) a framework for stress-testing planners with realistic perception noise, and (3) empirical findings that the planner’s safety can degrade significantly under plausible detection errors. The work offers a practical tool for data-driven evaluation of self-driving systems, enabling more thorough assessments of planning robustness and informing safer system design.

Abstract

To handle the complexities of real-world traffic, learning planners for self-driving from data is a promising direction. While recent approaches have shown great progress, they typically assume a setting in which the ground-truth world state is available as input. However, when deployed, planning needs to be robust to the long-tail of errors incurred by a noisy perception system, which is often neglected in evaluation. To address this, previous work has proposed drawing adversarial samples from a perception error model (PEM) mimicking the noise characteristics of a target object detector. However, these methods use simple PEMs that fail to accurately capture all failure modes of detection. In this paper, we present EMPERROR, a novel transformer-based generative PEM, apply it to stress-test an imitation learning (IL)-based planner and show that it imitates modern detectors more faithfully than previous work. Furthermore, it is able to produce realistic noisy inputs that increase the planner's collision rate by up to 85%, demonstrating its utility as a valuable tool for a more complete evaluation of self-driving planners.

Paper Structure

This paper contains 10 sections, 5 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Method Overview. We propose Emperror, a generative model that imitates a given detector by modeling the distribution of its perception errors conditioned on a ground-truth state and BEV map as scene context. Adversarial optimization in the model's latent space can then produce challenging yet plausible proxy detections from that distribution which stress-test the robustness of a given planner, e.g. by inducing collisions.
  • Figure 2: Generative Perception Error Model (PEM). We propose a pem based on the cvae framework to model the error characteristics of a target detector. It consists of a prior encoder, inferring a distribution over the latent variable $\mathbf{Z}$ given a ground-truth state $\mathbf{S}$ and BEV map $\mathcal{M}$ as scene context at test-time, and a deterministic decoder, which produces a set of proxy detections $\tilde{\mathbf{B}}$ given $\mathbf{Z}$. At training-time, a posterior encoder with a similar architecture as the prior encoder is used to encode and sample from the latent distribution. In contrast to the prior encoder, it also has access to privileged information in the form of the detector outputs $\hat{\mathbf{B}}$ to be reconstructed. Privileged components are shown in faint coloring.
  • Figure 3: pem Qualitative Results. Perception errors modeled by a baseline pem (middle) using an MLP with a Gaussian and Emperror (right) compared to DETR3D detections (left). Red boxes indicate ground truth objects, blue boxes the model predictions. While the baseline model samples implausible perception velocities and does not adapt to scene context, such as the parking area in Scene B, our approach closely mimics the target detector.
  • Figure 4: pem Error Characteristics. We show the precision and mean regression errors for DETR3D, Emperror and the MLP + Gauss configuration as baseline. All metrics are normalized for clearer visual comparison.
  • Figure 5: Qualitative Examples of Worst-Case Perception Errors ($\kappa=3$). Sampling adversarial perception errors from Emperror can induce unwarranted acceleration (Fig. \ref{['fig:quali_attack:start']}) and sudden braking (Fig. \ref{['fig:quali_attack:brake']}) in the planner, causing collisions. Red indicates ground truth objects, blue the model predictions. Non-filled boxes represent future states.
  • ...and 1 more figures