Table of Contents
Fetching ...

Towards Driver Behavior Understanding: Weakly-Supervised Risk Perception in Driving Scenes

Nakul Agarwal, Yi-Ting Chen, Behzad Dariush

TL;DR

A weakly supervised risk object identification framework is proposed that models the relationship between driver's intended maneuver and responses to identify potential risk sources and analyzes the role of pedestrian attention in estimating risk.

Abstract

Achieving zero-collision mobility remains a key objective for intelligent vehicle systems, which requires understanding driver risk perception-a complex cognitive process shaped by voluntary response of the driver to external stimuli and the attentiveness of surrounding road users towards the ego-vehicle. To support progress in this area, we introduce RAID (Risk Assessment In Driving scenes)-a large-scale dataset specifically curated for research on driver risk perception and contextual risk assessment. RAID comprises 4,691 annotated video clips, covering diverse traffic scenarios with labels for driver's intended maneuver, road topology, risk situations (e.g., crossing pedestrians), driver responses, and pedestrian attentiveness. Leveraging RAID, we propose a weakly supervised risk object identification framework that models the relationship between driver's intended maneuver and responses to identify potential risk sources. Additionally, we analyze the role of pedestrian attention in estimating risk and demonstrate the value of the proposed dataset. Experimental evaluations demonstrate that our method achieves 20.6% and 23.1% performance gains over prior state-of-the-art approaches on the RAID and HDDS datasets, respectively.

Towards Driver Behavior Understanding: Weakly-Supervised Risk Perception in Driving Scenes

TL;DR

A weakly supervised risk object identification framework is proposed that models the relationship between driver's intended maneuver and responses to identify potential risk sources and analyzes the role of pedestrian attention in estimating risk.

Abstract

Achieving zero-collision mobility remains a key objective for intelligent vehicle systems, which requires understanding driver risk perception-a complex cognitive process shaped by voluntary response of the driver to external stimuli and the attentiveness of surrounding road users towards the ego-vehicle. To support progress in this area, we introduce RAID (Risk Assessment In Driving scenes)-a large-scale dataset specifically curated for research on driver risk perception and contextual risk assessment. RAID comprises 4,691 annotated video clips, covering diverse traffic scenarios with labels for driver's intended maneuver, road topology, risk situations (e.g., crossing pedestrians), driver responses, and pedestrian attentiveness. Leveraging RAID, we propose a weakly supervised risk object identification framework that models the relationship between driver's intended maneuver and responses to identify potential risk sources. Additionally, we analyze the role of pedestrian attention in estimating risk and demonstrate the value of the proposed dataset. Experimental evaluations demonstrate that our method achieves 20.6% and 23.1% performance gains over prior state-of-the-art approaches on the RAID and HDDS datasets, respectively.
Paper Structure (11 sections, 7 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 11 sections, 7 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Perception of risk is a complex cognitive process that is manifested, among other things, by a voluntary response of the driver to external stimuli (e.g. deviating from the planned path in response to a truck that is blocking the path) as well as the apparent attentiveness of participants in the scene (e.g. crossing cyclist that is not attentive to the ego-vehicle).
  • Figure 2: Annotation statistics of our RAID dataset.
  • Figure 3: Proposed Network Architecture. The algorithm takes as input a sequence of RGB frames and object tracklets. We then extract corresponding agent-level features using partial convolution and RoIAlign, where in each iteration of the network, the partial convolution removes an agent using a binary mask. These features then form nodes of the graph convolution network. In parallel, the RGB frames are also used to obtain driver's action using a temporal encoder-decoder LSTM network. Finally, the feature representation from the graph and driver's action are combined to predict the driver's response.
  • Figure 4: Risk object identification results on the RAID dataset. The ground truth is shown in red boxes, predictions are shown using green boxes. The ego-vehicle is depicted by the orange car, and blue arrow shows future motion direction. A birds-eye-view representation is presented below each front-view image providing information including scene layout, intentions of traffic participants, and the ego vehicle.
  • Figure 5: Joint risk assessment on RAID. Top row shows detected agents with colored boxes and bottom bar chart uses matching box colors for risk scores. The black line marks the predicted ‘Continue’ score without intervention, and $\star$ shows adjusted scores after factoring in pedestrian attention.