Table of Contents
Fetching ...

ROAR: Robust Accident Recognition and Anticipation for Autonomous Driving

Xingcheng Liu, Yanchen Guan, Haicheng Liao, Zhengbing He, Zhenning Li

TL;DR

ROAR combines Discrete Wavelet Transform, a self adaptive object aware module, and dynamic focal loss to tackle challenges of sensor degradation, environmental noise, and imbalanced data distributions, and demonstrates the model's robustness in real-world conditions.

Abstract

Accurate accident anticipation is essential for enhancing the safety of autonomous vehicles (AVs). However, existing methods often assume ideal conditions, overlooking challenges such as sensor failures, environmental disturbances, and data imperfections, which can significantly degrade prediction accuracy. Additionally, previous models have not adequately addressed the considerable variability in driver behavior and accident rates across different vehicle types. To overcome these limitations, this study introduces ROAR, a novel approach for accident detection and prediction. ROAR combines Discrete Wavelet Transform (DWT), a self adaptive object aware module, and dynamic focal loss to tackle these challenges. The DWT effectively extracts features from noisy and incomplete data, while the object aware module improves accident prediction by focusing on high-risk vehicles and modeling the spatial temporal relationships among traffic agents. Moreover, dynamic focal loss mitigates the impact of class imbalance between positive and negative samples. Evaluated on three widely used datasets, Dashcam Accident Dataset (DAD), Car Crash Dataset (CCD), and AnAn Accident Detection (A3D), our model consistently outperforms existing baselines in key metrics such as Average Precision (AP) and mean Time to Accident (mTTA). These results demonstrate the model's robustness in real-world conditions, particularly in handling sensor degradation, environmental noise, and imbalanced data distributions. This work offers a promising solution for reliable and accurate accident anticipation in complex traffic environments.

ROAR: Robust Accident Recognition and Anticipation for Autonomous Driving

TL;DR

ROAR combines Discrete Wavelet Transform, a self adaptive object aware module, and dynamic focal loss to tackle challenges of sensor degradation, environmental noise, and imbalanced data distributions, and demonstrates the model's robustness in real-world conditions.

Abstract

Accurate accident anticipation is essential for enhancing the safety of autonomous vehicles (AVs). However, existing methods often assume ideal conditions, overlooking challenges such as sensor failures, environmental disturbances, and data imperfections, which can significantly degrade prediction accuracy. Additionally, previous models have not adequately addressed the considerable variability in driver behavior and accident rates across different vehicle types. To overcome these limitations, this study introduces ROAR, a novel approach for accident detection and prediction. ROAR combines Discrete Wavelet Transform (DWT), a self adaptive object aware module, and dynamic focal loss to tackle these challenges. The DWT effectively extracts features from noisy and incomplete data, while the object aware module improves accident prediction by focusing on high-risk vehicles and modeling the spatial temporal relationships among traffic agents. Moreover, dynamic focal loss mitigates the impact of class imbalance between positive and negative samples. Evaluated on three widely used datasets, Dashcam Accident Dataset (DAD), Car Crash Dataset (CCD), and AnAn Accident Detection (A3D), our model consistently outperforms existing baselines in key metrics such as Average Precision (AP) and mean Time to Accident (mTTA). These results demonstrate the model's robustness in real-world conditions, particularly in handling sensor degradation, environmental noise, and imbalanced data distributions. This work offers a promising solution for reliable and accurate accident anticipation in complex traffic environments.

Paper Structure

This paper contains 22 sections, 32 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Comparison of different levels of sensor distortion in the same scene relative to the raw video. The quality of the signal transmitted by the sensor directly impacts the input data, which in turn affects the model's ability to accurately anticipate accidents.
  • Figure 2: Overview of the proposed framework. The input video frames are processed by an object detector and feature extractor to obtain object-level and image-level features. These features are refined through a Self-Adaptive Object-Aware Module and Discrete Wavelet Transform (DWT). The refined features are fused and passed through a GRU and Temporal Attention Fusion to compute the anticipation probability ($p_t$) and enhancement probability ($p_e$), with a time weight layer adjusting temporal influence on predictions. The framework integrates spatial, temporal, and hierarchical features for enhanced prediction accuracy.
  • Figure 3: Comparison between Fourier Transform and Wavelet Transform. The upper panel illustrates how the Fourier Transform decomposes a signal into constituent sinusoids of different frequencies, which represent the signal in the frequency domain. The lower panel demonstrates how the Wavelet Transform decomposes a signal into wavelets of different scales and positions, offering a time-frequency representation that preserves both time and frequency information. This comparison highlights the advantages of Wavelet Transform in capturing localized features of the signal, particularly for non-stationary or time-varying data.
  • Figure 4: One-level Discrete Wavelet Transform (DWT) decomposition applied to discrete data points. The process involves filtering the data through a high-pass filter to extract high-frequency components (denoted as $cD$) and through a low-pass filter to extract low-frequency components (denoted as $cA$). The original data is divided into two halves, with the high-pass filter focusing on capturing the rapid variations (high-frequency) and the low-pass filter capturing the smooth trends (low-frequency).
  • Figure 5: Visualization of ROAR's performance on the DAD dataset with 0.2 variance of Gaussian noise, with a threshold uniformly set at 0.5. Scenes (a) and (b) showcase successful accident anticipations, with (a) representing a scenario of smooth driving with no risk and (b) displaying a crash detection with a correct anticipation. Scene (c) illustrates a false-positive case where high-speed motorcycles are incorrectly anticipated as a collision, while (d) demonstrates a false-negative case, where a small collision in the distance is missed by the model. The accident anticipation probabilities are plotted over time, highlighting ROAR's ability to handle real-world challenges, including noise and dynamic traffic conditions. The model adjusts its predictions based on the context, even in situations with false alarms.