Enhancing End-to-End Autonomous Driving Systems Through Synchronized Human Behavior Data

Yiqun Duan; Zhuoli Zhuang; Jinzhao Zhou; Yu-Cheng Chang; Yu-Kai Wang; Chin-Teng Lin

Enhancing End-to-End Autonomous Driving Systems Through Synchronized Human Behavior Data

Yiqun Duan, Zhuoli Zhuang, Jinzhao Zhou, Yu-Cheng Chang, Yu-Kai Wang, Chin-Teng Lin

TL;DR

The paper tackles the problem of limited generalization and trust in End-to-End autonomous driving by introducing synchronized human-machine driving data, including eye-tracking and EEG signals, to guide learning. It proposes a Hybrid Fusion Transformer Encoder with Monotonic-to-BEV Translation and a Decision Transformer, augmented with human-guidance headers for eye-tracking and intention signals, trained via a joint loss that combines perception, planning, and supervision terms. Experimental results in CARLA show that human eye-tracking guidance improves Driving Score, while brainwave-based intention guidance does not yield immediate gains due to signal noise and alignment issues, illustrating both the potential and current challenges of human-guided autonomy. The approach highlights a concrete pathway toward more robust and trustworthy autonomous systems through multimodal human supervision, while identifying key technical hurdles to overcome for scalable adoption.

Abstract

This paper presents a pioneering exploration into the integration of fine-grained human supervision within the autonomous driving domain to enhance system performance. The current advances in End-to-End autonomous driving normally are data-driven and rely on given expert trials. However, this reliance limits the systems' generalizability and their ability to earn human trust. Addressing this gap, our research introduces a novel approach by synchronously collecting data from human and machine drivers under identical driving scenarios, focusing on eye-tracking and brainwave data to guide machine perception and decision-making processes. This paper utilizes the Carla simulation to evaluate the impact brought by human behavior guidance. Experimental results show that using human attention to guide machine attention could bring a significant improvement in driving performance. However, guidance by human intention still remains a challenge. This paper pioneers a promising direction and potential for utilizing human behavior guidance to enhance autonomous systems.

Enhancing End-to-End Autonomous Driving Systems Through Synchronized Human Behavior Data

TL;DR

Abstract

Paper Structure (23 sections, 8 equations, 5 figures, 1 table)

This paper contains 23 sections, 8 equations, 5 figures, 1 table.

Introduction
Related Works
End-to-End Autonomous Driving
Real-Time Human Guidance for Autonomous systems
Methodology
Overview
Hybrid Fusion Transformer Encoder
Monotonic-to-BEV Translation (MBT)
Decision Transformer
Basic Prediction Headers
Waypoints Prediction
Density Map Status Prediction
Traffic Rule Prediction
Human-Guidance Headers
Human Eye-Tracking Attention Prediction
...and 8 more sections

Figures (5)

Figure 1: Visual schema of human enhanced autonomous driving. This model subjects both the machine and the human to identical driving scenarios along the same route. Throughout this process, we collect synchronous data on human behavior, including eye-tracking metrics, brainwave patterns, and brake signaling, capturing these elements in unison.
Figure 2: Framework of injecting human guidance into the autonomous system, taking autonomous driving as an example. The lower part is the hybrid fusion transformer encoder defined in Section \ref{['subsec:hybridfusion']}. The learned machine state is fed into a decision transformer to come out with the final driving waypoint prediction. The decision transformer is supervised by waypoints prediction GT and other safety constraints. The human guidance is injected by adding two human-behavior data supervision branches. The decision transformer is required to reconstruct human behavior (eye-tracking attention and brake intention) jointly with other targets.
Figure 3: The structure of the MBT attention module, where the features from monotonic view are projected into BEV space through a sequence-to-sequence formation.
Figure 4: Visualization example of the collected combined human eye-tracking attention data.
Figure 5: Example of the statistical distribution of left and right eye's pupil position and diameter size change during one episode.

Enhancing End-to-End Autonomous Driving Systems Through Synchronized Human Behavior Data

TL;DR

Abstract

Enhancing End-to-End Autonomous Driving Systems Through Synchronized Human Behavior Data

Authors

TL;DR

Abstract

Table of Contents

Figures (5)