Table of Contents
Fetching ...

IGDrivSim: A Benchmark for the Imitation Gap in Autonomous Driving

Clémence Grislain, Risto Vuorio, Cong Lu, Shimon Whiteson

TL;DR

The paper tackles the imitation gap in imitation learning for autonomous driving, formalizing the gap as a mismatch between expert and imitator observations $O_{expert} \neq O_{imitator}$. It introduces IGDrivSim, a benchmark atop the Waymax simulator that imposes partial observability to systematically study how BC-based IL performs when perception differs from human drivers. The key finding is that BC alone often fails to learn safe and effective policies under the imitation gap, but integrating BC with reinforcement learning through a simple penalty reward (PPO-based) significantly mitigates failures and improves safety metrics. By releasing open-source code and motion-prediction baselines, the work provides a practical tool for evaluating and developing perception-aware driving policies tailored to the sensors of self-driving cars.

Abstract

Developing autonomous vehicles that can navigate complex environments with human-level safety and efficiency is a central goal in self-driving research. A common approach to achieving this is imitation learning, where agents are trained to mimic human expert demonstrations collected from real-world driving scenarios. However, discrepancies between human perception and the self-driving car's sensors can introduce an $\textit{imitation}$ gap, leading to imitation learning failures. In this work, we introduce $\textbf{IGDrivSim}$, a benchmark built on top of the Waymax simulator, designed to investigate the effects of the imitation gap in learning autonomous driving policy from human expert demonstrations. Our experiments show that this perception gap between human experts and self-driving agents can hinder the learning of safe and effective driving behaviors. We further show that combining imitation with reinforcement learning, using a simple penalty reward for prohibited behaviors, effectively mitigates these failures. Our code is open-sourced at: https://github.com/clemgris/IGDrivSim.git.

IGDrivSim: A Benchmark for the Imitation Gap in Autonomous Driving

TL;DR

The paper tackles the imitation gap in imitation learning for autonomous driving, formalizing the gap as a mismatch between expert and imitator observations . It introduces IGDrivSim, a benchmark atop the Waymax simulator that imposes partial observability to systematically study how BC-based IL performs when perception differs from human drivers. The key finding is that BC alone often fails to learn safe and effective policies under the imitation gap, but integrating BC with reinforcement learning through a simple penalty reward (PPO-based) significantly mitigates failures and improves safety metrics. By releasing open-source code and motion-prediction baselines, the work provides a practical tool for evaluating and developing perception-aware driving policies tailored to the sensors of self-driving cars.

Abstract

Developing autonomous vehicles that can navigate complex environments with human-level safety and efficiency is a central goal in self-driving research. A common approach to achieving this is imitation learning, where agents are trained to mimic human expert demonstrations collected from real-world driving scenarios. However, discrepancies between human perception and the self-driving car's sensors can introduce an gap, leading to imitation learning failures. In this work, we introduce , a benchmark built on top of the Waymax simulator, designed to investigate the effects of the imitation gap in learning autonomous driving policy from human expert demonstrations. Our experiments show that this perception gap between human experts and self-driving agents can hinder the learning of safe and effective driving behaviors. We further show that combining imitation with reinforcement learning, using a simple penalty reward for prohibited behaviors, effectively mitigates these failures. Our code is open-sourced at: https://github.com/clemgris/IGDrivSim.git.

Paper Structure

This paper contains 24 sections, 1 equation, 3 figures, 1 table.

Figures (3)

  • Figure 1: IGDrivSim scenarios illustrating partial observability: the blue vehicle navigates among road lines, streetlights, and other vehicles. From left to right: (a) Circular field of view (blue), (b) Conic field of view (blue), (c) Noisy self-position (magenta), (d) Noisy vehicle detection (occluded vehicles in pink).
  • Figure 2: BC fails under the imitation gap. Solid lines show episodic returns for agents trained with RL or BC under full or partial observability (receptive field size 3). Dashed lines indicate upper bound returns from converged RL agents approximating Bayes-optimal policies, and shading represents the mean and 95% confidence interval across 10 seeds.
  • Figure 3: Comparison of per-step metrics—log divergence, off-road, and overlap—between policies trained with BC (blue) and the combined BC-RL (orange) using parameters ($w_{BC}=1, w_{RL}=0.05$). The metrics are averaged over three seeds, with 95% confidence intervals shown as error bars, for imitators under different partial observability.