Table of Contents
Fetching ...

CAPS: Context-Aware Priority Sampling for Enhanced Imitation Learning in Autonomous Driving

Hamidreza Mirkhani, Behzad Khamidehi, Ehsan Ahmadi, Fazel Arasteh, Mohammed Elmahgiubi, Weize Zhang, Umar Rajguru, Kasra Rezaee

TL;DR

The paper tackles data imbalance in imitation learning for autonomous driving, showing that standard IL can underperform in edge cases. It introduces CAPS, a two-stage framework that uses a VQ-VAE to cluster context-rich trajectory data and reweight training samples by cluster frequency, guided by a VectorNet-based context encoder and a contingency-aware trajectory decoder. Empirical results in CARLA Leaderboard 2 Bench2Drive demonstrate that CAPS outperforms state-of-the-art baselines in closed-loop evaluation, highlighting improved generalization to rare but critical scenarios. The work offers a data-efficient path to robust autonomous driving policies and suggests practical benefits for fleet-scale data collection and prioritization.

Abstract

In this paper, we introduce CAPS (Context-Aware Priority Sampling), a novel method designed to enhance data efficiency in learning-based autonomous driving systems. CAPS addresses the challenge of imbalanced training datasets in imitation learning by leveraging Vector Quantized Variational Autoencoders (VQ-VAEs). The use of VQ-VAE provides a structured and interpretable data representation, which helps reveal meaningful patterns in the data. These patterns are used to group the data into clusters, with each sample being assigned a cluster ID. The cluster IDs are then used to re-balance the dataset, ensuring that rare yet valuable samples receive higher priority during training. By ensuring a more diverse and informative training set, CAPS improves the generalization of the trained planner across a wide range of driving scenarios. We evaluate our method through closed-loop simulations in the CARLA environment. The results on Bench2Drive scenarios demonstrate that our framework outperforms state-of-the-art methods, leading to notable improvements in model performance.

CAPS: Context-Aware Priority Sampling for Enhanced Imitation Learning in Autonomous Driving

TL;DR

The paper tackles data imbalance in imitation learning for autonomous driving, showing that standard IL can underperform in edge cases. It introduces CAPS, a two-stage framework that uses a VQ-VAE to cluster context-rich trajectory data and reweight training samples by cluster frequency, guided by a VectorNet-based context encoder and a contingency-aware trajectory decoder. Empirical results in CARLA Leaderboard 2 Bench2Drive demonstrate that CAPS outperforms state-of-the-art baselines in closed-loop evaluation, highlighting improved generalization to rare but critical scenarios. The work offers a data-efficient path to robust autonomous driving policies and suggests practical benefits for fleet-scale data collection and prioritization.

Abstract

In this paper, we introduce CAPS (Context-Aware Priority Sampling), a novel method designed to enhance data efficiency in learning-based autonomous driving systems. CAPS addresses the challenge of imbalanced training datasets in imitation learning by leveraging Vector Quantized Variational Autoencoders (VQ-VAEs). The use of VQ-VAE provides a structured and interpretable data representation, which helps reveal meaningful patterns in the data. These patterns are used to group the data into clusters, with each sample being assigned a cluster ID. The cluster IDs are then used to re-balance the dataset, ensuring that rare yet valuable samples receive higher priority during training. By ensuring a more diverse and informative training set, CAPS improves the generalization of the trained planner across a wide range of driving scenarios. We evaluate our method through closed-loop simulations in the CARLA environment. The results on Bench2Drive scenarios demonstrate that our framework outperforms state-of-the-art methods, leading to notable improvements in model performance.

Paper Structure

This paper contains 8 sections, 3 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Block diagram of the CAPS framework. In stage 1, a VQ-VAE module is used to train the clustering model. In stage 2, the trained model estimates codebook IDs for the training samples, and by adjusting the cluster frequencies, the samples are re-weighted to fine-tune the main planner model.
  • Figure 2: Comparison of two examples of scenes clustered using our approach, where each row corresponds to the same codebook ID. The first row predominantly contains parking cut-in scenarios, while the second row features instances where the ego vehicle stops and waits behind a stationary vehicle or obstacle.