Table of Contents
Fetching ...

Discrete Contrastive Learning for Diffusion Policies in Autonomous Driving

Kalle Kujanpää, Daulet Baimukashev, Farzeen Munir, Shoaib Azam, Tomasz Piotr Kucner, Joni Pajarinen, Ville Kyrki

TL;DR

This work tackles the problem of realistic autonomous-driving simulation by explicitly modeling diverse human driving styles. It introduces Discrete Style Diffusion Policy (DSDP), which first learns discrete driving styles through contrastive learning with the InfoNCE loss and LFQ discretization, then trains a conditional DDPM to generate actions conditioned on both observations and style. Empirical evaluation on NGSIM and Highway-ENV shows that DSDP yields safer and more human-like trajectories than strong baselines, with ablations confirming the importance of contrastive style extraction and discrete style conditioning. The approach improves realism in driving simulations, enhancing the fidelity of AV evaluation and the potential for better sim-to-real transfer.

Abstract

Learning to perform accurate and rich simulations of human driving behaviors from data for autonomous vehicle testing remains challenging due to human driving styles' high diversity and variance. We address this challenge by proposing a novel approach that leverages contrastive learning to extract a dictionary of driving styles from pre-existing human driving data. We discretize these styles with quantization, and the styles are used to learn a conditional diffusion policy for simulating human drivers. Our empirical evaluation confirms that the behaviors generated by our approach are both safer and more human-like than those of the machine-learning-based baseline methods. We believe this has the potential to enable higher realism and more effective techniques for evaluating and improving the performance of autonomous vehicles.

Discrete Contrastive Learning for Diffusion Policies in Autonomous Driving

TL;DR

This work tackles the problem of realistic autonomous-driving simulation by explicitly modeling diverse human driving styles. It introduces Discrete Style Diffusion Policy (DSDP), which first learns discrete driving styles through contrastive learning with the InfoNCE loss and LFQ discretization, then trains a conditional DDPM to generate actions conditioned on both observations and style. Empirical evaluation on NGSIM and Highway-ENV shows that DSDP yields safer and more human-like trajectories than strong baselines, with ablations confirming the importance of contrastive style extraction and discrete style conditioning. The approach improves realism in driving simulations, enhancing the fidelity of AV evaluation and the potential for better sim-to-real transfer.

Abstract

Learning to perform accurate and rich simulations of human driving behaviors from data for autonomous vehicle testing remains challenging due to human driving styles' high diversity and variance. We address this challenge by proposing a novel approach that leverages contrastive learning to extract a dictionary of driving styles from pre-existing human driving data. We discretize these styles with quantization, and the styles are used to learn a conditional diffusion policy for simulating human drivers. Our empirical evaluation confirms that the behaviors generated by our approach are both safer and more human-like than those of the machine-learning-based baseline methods. We believe this has the potential to enable higher realism and more effective techniques for evaluating and improving the performance of autonomous vehicles.

Paper Structure

This paper contains 14 sections, 5 equations, 2 figures, 7 tables.

Figures (2)

  • Figure 1: An overview of our method, DSDP (Discrete Style Diffusion Policy). DSDP takes a history of past driving behavior as observation-action pairs $(\mathbf{o}, \mathbf{a})$ and uses a prior network to sample a driving style $\mathbf{c}$. The styles are learned via discrete contrastive learning. The denoising network then uses the encoded current observation $\mathbf{o}$ and style $\mathbf{c}$ to generate a driving action $\mathbf{a}$ through diffusion.
  • Figure 2: Contrastive pre-training (left), policy and prior training (upper right), and the evaluation (bottom right). During contrastive pre-training, we sample two sub-trajectories $\mathbf{x} = (\mathbf{o}_{i,1}, \mathbf{a}_{i,1}, \dots, \mathbf{o}_{i,L_c}, \mathbf{a}_{i,L_c})$ and $\mathbf{y+}$ from each trajectory $\tau_i$, process them with the contrastive representation function consisting of the encoder $f_\text{enc}$, the LFQ discretization layer and the decoder $f_\text{dec}$ to get the styles $\mathbf{c}_\text{anchor}, \mathbf{c}_\text{positive}$. We utilize the styles corresponding to the sub-trajectories from other trajectories in the batch as negatives, $\mathbf{c}_\text{negative}$ in the computation of the InfoNCE loss. The prior $p(\mathbf{x})$ is trained to predict the index of the encoded sub-trajectory $\mathbf{x}$, and the policy $\pi$ to minimize the DDPM loss. Two parallel lines signify cutting the gradient flow; the contrastive representation function f is fixed during prior and policy training.