Table of Contents
Fetching ...

BITS: Bi-level Imitation for Traffic Simulation

Danfei Xu, Yuxiao Chen, Boris Ivanovic, Marco Pavone

TL;DR

BITs addresses the gap in realistic traffic behaviors for autonomous-vehicle simulation by learning from real driving logs using a bi-level imitation scheme that separates high-level intent from low-level controls. A spatial 2D goal distribution guides a deterministic goal-conditioned policy, while a prediction-and-planning module regularizes long-horizon rollouts with rule-based costs. The authors validate BITS on Lyft Level 5 and nuScenes, showing improved realism, diversity, and stability compared to baselines, and provide an open-source framework to unify datasets for interactive simulation. This work offers a practical, data-driven framework and tools to advance traffic simulation for AV validation and development.

Abstract

Simulation is the key to scaling up validation and verification for robotic systems such as autonomous vehicles. Despite advances in high-fidelity physics and sensor simulation, a critical gap remains in simulating realistic behaviors of road users. This is because, unlike simulating physics and graphics, devising first principle models for human-like behaviors is generally infeasible. In this work, we take a data-driven approach and propose a method that can learn to generate traffic behaviors from real-world driving logs. The method achieves high sample efficiency and behavior diversity by exploiting the bi-level hierarchy of driving behaviors by decoupling the traffic simulation problem into high-level intent inference and low-level driving behavior imitation. The method also incorporates a planning module to obtain stable long-horizon behaviors. We empirically validate our method, named Bi-level Imitation for Traffic Simulation (BITS), with scenarios from two large-scale driving datasets and show that BITS achieves balanced traffic simulation performance in realism, diversity, and long-horizon stability. We also explore ways to evaluate behavior realism and introduce a suite of evaluation metrics for traffic simulation. Finally, as part of our core contributions, we develop and open source a software tool that unifies data formats across different driving datasets and converts scenes from existing datasets into interactive simulation environments. For additional information and videos, see https://sites.google.com/view/nvr-bits2022/home

BITS: Bi-level Imitation for Traffic Simulation

TL;DR

BITs addresses the gap in realistic traffic behaviors for autonomous-vehicle simulation by learning from real driving logs using a bi-level imitation scheme that separates high-level intent from low-level controls. A spatial 2D goal distribution guides a deterministic goal-conditioned policy, while a prediction-and-planning module regularizes long-horizon rollouts with rule-based costs. The authors validate BITS on Lyft Level 5 and nuScenes, showing improved realism, diversity, and stability compared to baselines, and provide an open-source framework to unify datasets for interactive simulation. This work offers a practical, data-driven framework and tools to advance traffic simulation for AV validation and development.

Abstract

Simulation is the key to scaling up validation and verification for robotic systems such as autonomous vehicles. Despite advances in high-fidelity physics and sensor simulation, a critical gap remains in simulating realistic behaviors of road users. This is because, unlike simulating physics and graphics, devising first principle models for human-like behaviors is generally infeasible. In this work, we take a data-driven approach and propose a method that can learn to generate traffic behaviors from real-world driving logs. The method achieves high sample efficiency and behavior diversity by exploiting the bi-level hierarchy of driving behaviors by decoupling the traffic simulation problem into high-level intent inference and low-level driving behavior imitation. The method also incorporates a planning module to obtain stable long-horizon behaviors. We empirically validate our method, named Bi-level Imitation for Traffic Simulation (BITS), with scenarios from two large-scale driving datasets and show that BITS achieves balanced traffic simulation performance in realism, diversity, and long-horizon stability. We also explore ways to evaluate behavior realism and introduce a suite of evaluation metrics for traffic simulation. Finally, as part of our core contributions, we develop and open source a software tool that unifies data formats across different driving datasets and converts scenes from existing datasets into interactive simulation environments. For additional information and videos, see https://sites.google.com/view/nvr-bits2022/home
Paper Structure (19 sections, 4 equations, 8 figures, 7 tables)

This paper contains 19 sections, 4 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: BITS framework overview: Decision context $c_t$ is a tensor containing the semantic map and rasterized agent history concatenated channel-wise. Given $c_t$ as input, (1) the spatial goal network produces a 2D spatial distribution of short-horizon goals, (2) the goal-conditional policy generates a set of actions for each sampled goal, (3) a trajectory forecasting model predicts the future motion of the neighboring agents, and finally (4), based on the predicted future states, the framework selects the set of actions that minimizes a rule-based cost function.
  • Figure 2: Trajectories generated by each stochastic method over 5 trials in the Lyft dataset HoustonZuidhofEtAl2020. Our method (BITS, last row) generates diverse and stable long-horizon simulation rollouts (visualized as colored lines emanating from agents). Other methods suffer from a lack of diversity (e.g., TPP SalzmannIvanovicEtAl2020 on top) or high collision and off-road rates (e.g., TrafficSim suo2021trafficsim in the second row). Agents are represented with blue bounding boxes and trajectory line color denotes simulation timestep.
  • Figure 3: Left: Time-to-failure rates caused by road departure (offroad) errors. Right: Learned likelihood score of recorded trajectories under different levels of perturbations.
  • Figure 4: (Same as Fig. 1 in the main text) BITS framework overview: Decision context $c_t$ is a tensor containing the semantic map and rasterized agent history concatenated channel-wise. Given $c_t$ as input, (1) the spatial goal network produces a 2D spatial distribution of short-horizon goals, (2) the goal-conditional policy generates a sequence of actions for each sampled goal, (3) a trajectory forecasting model predicts the future motion of the neighboring agents, and finally (4), based on the predicted future states, the framework selects the sequence of actions that minimizes a rule-based cost function.
  • Figure 5: Vehicle-vehicle collision check illustration.
  • ...and 3 more figures