Table of Contents
Fetching ...

Neuro-Cognitive Reward Modeling for Human-Centered Autonomous Vehicle Control

Zhuoli Zhuang, Yu-Cheng Chang, Yu-Kai Wang, Thomas Do, Chin-Teng Lin

Abstract

Recent advancements in computer vision have accelerated the development of autonomous driving. Despite these advancements, training machines to drive in a way that aligns with human expectations remains a significant challenge. Human factors are still essential, as humans possess a sophisticated cognitive system capable of rapidly interpreting scene information and making accurate decisions. Aligning machine with human intent has been explored with Reinforcement Learning with Human Feedback (RLHF). Conventional RLHF methods rely on collecting human preference data by manually ranking generated outputs, which is time-consuming and indirect. In this work, we propose an electroencephalography (EEG)-guided decision-making framework to incorporate human cognitive insights without behaviour response interruption into reinforcement learning (RL) for autonomous driving. We collected EEG signals from 20 participants in a realistic driving simulator and analyzed event-related potentials (ERP) in response to sudden environmental changes. Our proposed framework employs a neural network to predict the strength of ERP based on the cognitive information from visual scene information. Moreover, we explore the integration of such cognitive information into the reward signal of the RL algorithm. Experimental results show that our framework can improve the collision avoidance ability of the RL algorithm, highlighting the potential of neuro-cognitive feedback in enhancing autonomous driving systems. Our project page is: https://alex95gogo.github.io/Cognitive-Reward/.

Neuro-Cognitive Reward Modeling for Human-Centered Autonomous Vehicle Control

Abstract

Recent advancements in computer vision have accelerated the development of autonomous driving. Despite these advancements, training machines to drive in a way that aligns with human expectations remains a significant challenge. Human factors are still essential, as humans possess a sophisticated cognitive system capable of rapidly interpreting scene information and making accurate decisions. Aligning machine with human intent has been explored with Reinforcement Learning with Human Feedback (RLHF). Conventional RLHF methods rely on collecting human preference data by manually ranking generated outputs, which is time-consuming and indirect. In this work, we propose an electroencephalography (EEG)-guided decision-making framework to incorporate human cognitive insights without behaviour response interruption into reinforcement learning (RL) for autonomous driving. We collected EEG signals from 20 participants in a realistic driving simulator and analyzed event-related potentials (ERP) in response to sudden environmental changes. Our proposed framework employs a neural network to predict the strength of ERP based on the cognitive information from visual scene information. Moreover, we explore the integration of such cognitive information into the reward signal of the RL algorithm. Experimental results show that our framework can improve the collision avoidance ability of the RL algorithm, highlighting the potential of neuro-cognitive feedback in enhancing autonomous driving systems. Our project page is: https://alex95gogo.github.io/Cognitive-Reward/.

Paper Structure

This paper contains 23 sections, 3 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: The framework of the human cognitive reward model. First, raw EEG data are preprocessed, and the most prominent feature, which is the ERP, is extracted. Second, an EEG feature prediction model predicts the ERP from the scene images. Lastly, the prediction probability is used as part of the reward of reinforcement learning for autonomous driving tasks.
  • Figure 2: Upper: The dataset collection environment. The HTC VIVE Pro Eye VR headset and Logitech G923 Racing Wheel and Pedal give the subject a more realistic driving experience. Lower: the example of scene image of driver's view, camera's view, and bird's eye view.
  • Figure 3: The policy network determines the vehicle's policy from a sequence of three segmentation images, starting with a CNN to extract features. These features are then processed through a self-attention layer. The policy network includes two MLP prediction heads: one for estimating the throttle and brake strength and another for predicting TTC, which aids in training regularization.
  • Figure 4: The average ERP wave from 20 participants. Condition 1: participants actively react to the emergency braking. Condition 2: participants are not required to react actively. The gray region indicates a significant difference between conditions.
  • Figure 5: Machine attention visualization of the policy network of RL across three time steps in the emergency braking scenario. Our model consistently focuses on the lead vehicle.