Table of Contents
Fetching ...

AHMF: Adaptive Hybrid-Memory-Fusion Model for Driver Attention Prediction

Dongyang Xu, Qingfan Wang, Ji Ma, Xiangyun Zeng, Lei Chen

TL;DR

This work addresses the gap between data-driven driver attention prediction and human cognitive processes by introducing Adaptive Hybrid-Memory-Fusion (AHMF), which explicitly models working memory for scene comprehension and long-term memory for experience-based retrieval. AHMF uses a temporal-spatial working memory encoder and an attention-based hybrid memory fusion mechanism to integrate current hazardous stimuli with cross-dataset experiences via cross-attention, augmented by domain-adaptation modules. The approach achieves state-of-the-art performance across multiple public datasets, with notable improvements in SIM, NSS, and related metrics, and demonstrates the importance of memory fusion over purely perceptual features. This has practical implications for more robust and generalizable driver assistance systems, and it opens avenues for deeper cognitive-science–inspired memory modeling in computer vision tasks.

Abstract

Accurate driver attention prediction can serve as a critical reference for intelligent vehicles in understanding traffic scenes and making informed driving decisions. Though existing studies on driver attention prediction improved performance by incorporating advanced saliency detection techniques, they overlooked the opportunity to achieve human-inspired prediction by analyzing driving tasks from a cognitive science perspective. During driving, drivers' working memory and long-term memory play crucial roles in scene comprehension and experience retrieval, respectively. Together, they form situational awareness, facilitating drivers to quickly understand the current traffic situation and make optimal decisions based on past driving experiences. To explicitly integrate these two types of memory, this paper proposes an Adaptive Hybrid-Memory-Fusion (AHMF) driver attention prediction model to achieve more human-like predictions. Specifically, the model first encodes information about specific hazardous stimuli in the current scene to form working memories. Then, it adaptively retrieves similar situational experiences from the long-term memory for final prediction. Utilizing domain adaptation techniques, the model performs parallel training across multiple datasets, thereby enriching the accumulated driving experience within the long-term memory module. Compared to existing models, our model demonstrates significant improvements across various metrics on multiple public datasets, proving the effectiveness of integrating hybrid memories in driver attention prediction.

AHMF: Adaptive Hybrid-Memory-Fusion Model for Driver Attention Prediction

TL;DR

This work addresses the gap between data-driven driver attention prediction and human cognitive processes by introducing Adaptive Hybrid-Memory-Fusion (AHMF), which explicitly models working memory for scene comprehension and long-term memory for experience-based retrieval. AHMF uses a temporal-spatial working memory encoder and an attention-based hybrid memory fusion mechanism to integrate current hazardous stimuli with cross-dataset experiences via cross-attention, augmented by domain-adaptation modules. The approach achieves state-of-the-art performance across multiple public datasets, with notable improvements in SIM, NSS, and related metrics, and demonstrates the importance of memory fusion over purely perceptual features. This has practical implications for more robust and generalizable driver assistance systems, and it opens avenues for deeper cognitive-science–inspired memory modeling in computer vision tasks.

Abstract

Accurate driver attention prediction can serve as a critical reference for intelligent vehicles in understanding traffic scenes and making informed driving decisions. Though existing studies on driver attention prediction improved performance by incorporating advanced saliency detection techniques, they overlooked the opportunity to achieve human-inspired prediction by analyzing driving tasks from a cognitive science perspective. During driving, drivers' working memory and long-term memory play crucial roles in scene comprehension and experience retrieval, respectively. Together, they form situational awareness, facilitating drivers to quickly understand the current traffic situation and make optimal decisions based on past driving experiences. To explicitly integrate these two types of memory, this paper proposes an Adaptive Hybrid-Memory-Fusion (AHMF) driver attention prediction model to achieve more human-like predictions. Specifically, the model first encodes information about specific hazardous stimuli in the current scene to form working memories. Then, it adaptively retrieves similar situational experiences from the long-term memory for final prediction. Utilizing domain adaptation techniques, the model performs parallel training across multiple datasets, thereby enriching the accumulated driving experience within the long-term memory module. Compared to existing models, our model demonstrates significant improvements across various metrics on multiple public datasets, proving the effectiveness of integrating hybrid memories in driver attention prediction.
Paper Structure (20 sections, 8 equations, 4 figures, 7 tables)

This paper contains 20 sections, 8 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Overview of the proposed AHMF driver attention prediction model.
  • Figure 2: Qualitative results of the predicted driver attention maps. From left to right: raw inputs, ground-truth maps, predictions of ours, MLNet cornia2016deep, and PGNet wang2021pgnet.
  • Figure 3: Qualitative results of the predicted driver attention maps in the DReyeVE dataset. From left to right: raw inputs, ground-truth attention maps, predictions of ours, MLNetcornia2016deep, and PGNet wang2021pgnet.
  • Figure 4: Qualitative results of the predicted driver attention maps in the BDD-A dataset. From left to right: raw inputs, ground-truth attention maps, predictions of ours, MLNetcornia2016deep, and PGNet wang2021pgnet.