Table of Contents
Fetching ...

AEGIS: Human Attention-based Explainable Guidance for Intelligent Vehicle Systems

Zhuoli Zhuang, Cheng-You Lu, Yu-Cheng Fred Chang, Yu-Kai Wang, Thomas Do, Chin-Teng Lin

TL;DR

AEGIS addresses interpretability and data efficiency in reinforcement learning for autonomous driving by injecting human attention as guidance into the RL policy via a self-attention module. It leverages a large VR eye-tracking dataset (1.2 million frames from 20 participants across six scenarios) to pre-train a human-attention predictor and applies KL-based regularization to align machine attention with human attention during the first phase of RL training, with an auxiliary TTC-based loss for safety. Empirical results in CARLA across car-following, left-turn, and occlusion scenarios show faster convergence, improved generalization to unseen towns, and higher alignment between machine and human attention, with a user study confirming enhanced interpretability and perceived safety. The work contributes a large, immersive eye-tracking dataset, a human-attention-guided RL framework, and evidence that attention guidance can yield more robust, explainable AIVs with practical impact for safer autonomous systems.

Abstract

Improving decision-making capabilities in Autonomous Intelligent Vehicles (AIVs) has been a heated topic in recent years. Despite advancements, training machines to capture regions of interest for comprehensive scene understanding, like human perception and reasoning, remains a significant challenge. This study introduces a novel framework, Human Attention-based Explainable Guidance for Intelligent Vehicle Systems (AEGIS). AEGIS utilizes human attention, converted from eye-tracking, to guide reinforcement learning (RL) models to identify critical regions of interest for decision-making. AEGIS uses a pre-trained human attention model to guide RL models to identify critical regions of interest for decision-making. By collecting 1.2 million frames from 20 participants across six scenarios, AEGIS pre-trains a model to predict human attention patterns.

AEGIS: Human Attention-based Explainable Guidance for Intelligent Vehicle Systems

TL;DR

AEGIS addresses interpretability and data efficiency in reinforcement learning for autonomous driving by injecting human attention as guidance into the RL policy via a self-attention module. It leverages a large VR eye-tracking dataset (1.2 million frames from 20 participants across six scenarios) to pre-train a human-attention predictor and applies KL-based regularization to align machine attention with human attention during the first phase of RL training, with an auxiliary TTC-based loss for safety. Empirical results in CARLA across car-following, left-turn, and occlusion scenarios show faster convergence, improved generalization to unseen towns, and higher alignment between machine and human attention, with a user study confirming enhanced interpretability and perceived safety. The work contributes a large, immersive eye-tracking dataset, a human-attention-guided RL framework, and evidence that attention guidance can yield more robust, explainable AIVs with practical impact for safer autonomous systems.

Abstract

Improving decision-making capabilities in Autonomous Intelligent Vehicles (AIVs) has been a heated topic in recent years. Despite advancements, training machines to capture regions of interest for comprehensive scene understanding, like human perception and reasoning, remains a significant challenge. This study introduces a novel framework, Human Attention-based Explainable Guidance for Intelligent Vehicle Systems (AEGIS). AEGIS utilizes human attention, converted from eye-tracking, to guide reinforcement learning (RL) models to identify critical regions of interest for decision-making. AEGIS uses a pre-trained human attention model to guide RL models to identify critical regions of interest for decision-making. By collecting 1.2 million frames from 20 participants across six scenarios, AEGIS pre-trains a model to predict human attention patterns.

Paper Structure

This paper contains 23 sections, 6 equations, 18 figures, 12 tables.

Figures (18)

  • Figure 1: The dataset collection environment. The HTC VIVE Pro Eye VR headset and Logitech G923 Racing Wheel and Pedal give the subject a more realistic driving experience.
  • Figure 2: Car Following: The ego vehicle must avoid collisions with the car ahead by controlling the throttle and brake, ensuring it continues to follow the lead car. Left Turn: The ego vehicle must accurately time its left turn to avoid collisions with vehicles proceeding straight by controlling the throttle and brake.
  • Figure 3: Diverse occlusion scenes. The ego vehicle must control the throttle and brake to prevent collisions with occluded objects, such as pedestrians and cars.
  • Figure 4: Training and Testing scene. The car-following model is trained in Town 7, characterized by its rural setting and narrow roads, and then tested in Town 4, a mountainous area featuring highways. The left-turn model is trained in Town 1, a small town, and then tested in Town 5, a town with bridge and cross junctions.
  • Figure 5: Structure of AEGIS. Human Attention Network: This pre-trained network predicts human attention from a segmentation image. Policy Network: This network determines the vehicle's policy from a sequence of three segmentation images, starting with a CNN to extract features. These features are then flattened and processed through a self-attention layer, producing machine attention. This machine attention regulates RL training using the KL divergence loss relative to human attention. The policy network includes two MLP prediction heads: one for estimating the throttle and brake strength and another for predicting TTC, which aids in training regularization through the MSE loss. $\otimes$ represents dot product, and $\circledcirc$ represents scaled dot product (after normalization and Softmax).
  • ...and 13 more figures