Table of Contents
Fetching ...

Embodied Cognition Augmented End2End Autonomous Driving

Ling Niu, Xiaoji Zheng, Han Wang, Chen Zheng, Ziyuan Yang, Bokui Chen, Jiangtao Gong

TL;DR

This work tackles the supervision gap in vision-based end-to-end autonomous driving by introducing E3AD, a brain-inspired paradigm that learns driving cognition through cross-modal contrastive learning between a visual driving encoder and the large EEG model LaBraM. A Driving-Thinking model is trained on a self-collected cognitive dataset and then frozen to augment mainstream end-to-end driving frameworks through three interaction schemes, yielding substantial gains in planning accuracy and safety metrics while maintaining efficient inference. Key contributions include the first integration of human driving cognition into end-to-end planning, ablation analyses validating the role of EEG-guided cognition, and demonstrations of improved performance on NuScenes and Bench2Drive, with plans to release the dataset and code. This approach lays groundwork for embodied, brain-inspired augmentation of autonomous driving systems, though it faces data-scale challenges and motivates further exploration of the underlying cognitive mechanisms.

Abstract

In recent years, vision-based end-to-end autonomous driving has emerged as a new paradigm. However, popular end-to-end approaches typically rely on visual feature extraction networks trained under label supervision. This limited supervision framework restricts the generality and applicability of driving models. In this paper, we propose a novel paradigm termed $E^{3}AD$, which advocates for comparative learning between visual feature extraction networks and the general EEG large model, in order to learn latent human driving cognition for enhancing end-to-end planning. In this work, we collected a cognitive dataset for the mentioned contrastive learning process. Subsequently, we investigated the methods and potential mechanisms for enhancing end-to-end planning with human driving cognition, using popular driving models as baselines on publicly available autonomous driving datasets. Both open-loop and closed-loop tests are conducted for a comprehensive evaluation of planning performance. Experimental results demonstrate that the $E^{3}AD$ paradigm significantly enhances the end-to-end planning performance of baseline models. Ablation studies further validate the contribution of driving cognition and the effectiveness of comparative learning process. To the best of our knowledge, this is the first work to integrate human driving cognition for improving end-to-end autonomous driving planning. It represents an initial attempt to incorporate embodied cognitive data into end-to-end autonomous driving, providing valuable insights for future brain-inspired autonomous driving systems. Our code will be made available at Github

Embodied Cognition Augmented End2End Autonomous Driving

TL;DR

This work tackles the supervision gap in vision-based end-to-end autonomous driving by introducing E3AD, a brain-inspired paradigm that learns driving cognition through cross-modal contrastive learning between a visual driving encoder and the large EEG model LaBraM. A Driving-Thinking model is trained on a self-collected cognitive dataset and then frozen to augment mainstream end-to-end driving frameworks through three interaction schemes, yielding substantial gains in planning accuracy and safety metrics while maintaining efficient inference. Key contributions include the first integration of human driving cognition into end-to-end planning, ablation analyses validating the role of EEG-guided cognition, and demonstrations of improved performance on NuScenes and Bench2Drive, with plans to release the dataset and code. This approach lays groundwork for embodied, brain-inspired augmentation of autonomous driving systems, though it faces data-scale challenges and motivates further exploration of the underlying cognitive mechanisms.

Abstract

In recent years, vision-based end-to-end autonomous driving has emerged as a new paradigm. However, popular end-to-end approaches typically rely on visual feature extraction networks trained under label supervision. This limited supervision framework restricts the generality and applicability of driving models. In this paper, we propose a novel paradigm termed , which advocates for comparative learning between visual feature extraction networks and the general EEG large model, in order to learn latent human driving cognition for enhancing end-to-end planning. In this work, we collected a cognitive dataset for the mentioned contrastive learning process. Subsequently, we investigated the methods and potential mechanisms for enhancing end-to-end planning with human driving cognition, using popular driving models as baselines on publicly available autonomous driving datasets. Both open-loop and closed-loop tests are conducted for a comprehensive evaluation of planning performance. Experimental results demonstrate that the paradigm significantly enhances the end-to-end planning performance of baseline models. Ablation studies further validate the contribution of driving cognition and the effectiveness of comparative learning process. To the best of our knowledge, this is the first work to integrate human driving cognition for improving end-to-end autonomous driving planning. It represents an initial attempt to incorporate embodied cognitive data into end-to-end autonomous driving, providing valuable insights for future brain-inspired autonomous driving systems. Our code will be made available at Github

Paper Structure

This paper contains 27 sections, 10 equations, 4 figures, 11 tables.

Figures (4)

  • Figure 1: The training is divided into two stages. In the first stage, contrastive learning is conducted between the Driving-Thinking Model—a pretrained spatio-temporal feature extraction network—and the large EEG model LaBraMLaBraM2024 on a self-collected dataset. In the second stage training and inference processes, we use the same inputs as other driving models, without introducing EEG data. Instead, the entire Driving-Thinking Model is kept frozen. Subsequently, we design three different frameworks to investigate how the driving cognition learned by the Driving-Thinking Model enhances end-to-end planning, as well as the associated mechanisms.
  • Figure A.1: Example snapshots of the video modalities: top left – Baseline (forward road view); top right – Driver1 (driver’s face view); bottom left – Driver2 (driver’s posture view); bottom right – Driver3 (driver’s feet view).
  • Figure A.2: Visualization Comparison of $E^{3}AD$ (VAD-Base) and the Baseline on Closed-loop Evaluation.
  • Figure A.3: Real-world data validation