Table of Contents
Fetching ...

LATTE: Lightweight Attention-based Traffic Accident Anticipation Engine

Jiaxun Zhang, Yanchen Guan, Chengyue Wang, Haicheng Liao, Guohui Zhang, Zhenning Li

TL;DR

LATTE addresses real-time traffic-accident anticipation under edge-computing constraints by integrating four synergistic components: Efficient Multiscale Spatial Aggregation (EMSA) for scalable spatial features, Memory Attention Aggregation (MAA) for memory-efficient temporal modeling, Auxiliary Self-Attention Aggregation (AAA) for extended temporal dependencies, and Flamingo Alert-Assisted System (FAA) for multilingual verbal hazard alerts. A probabilistic frame-level scorer, aided by Bayesian-style uncertainty, informs alerts while a two-tier training loss couples frame-level urgency with video-level semantic alignment. Empirical results on CCD, DAD, and A3D show state-of-the-art predictive performance on DAD (AP ≈ 89.7%), with substantial FLOPs and parameter reductions enabling real-time edge deployment, and robust ablations validating each architectural component. The framework demonstrates meaningful improvements in early warning windows and passenger situational awareness, suggesting practical impact for autonomous- and mixed-traffic safety systems, while outlining future directions for multimodal fusion and domain adaptation to address complex urban driving conditions.

Abstract

Accurately predicting traffic accidents in real-time is a critical challenge in autonomous driving, particularly in resource-constrained environments. Existing solutions often suffer from high computational overhead or fail to adequately address the uncertainty of evolving traffic scenarios. This paper introduces LATTE, a Lightweight Attention-based Traffic Accident Anticipation Engine, which integrates computational efficiency with state-of-the-art performance. LATTE employs Efficient Multiscale Spatial Aggregation (EMSA) to capture spatial features across scales, Memory Attention Aggregation (MAA) to enhance temporal modeling, and Auxiliary Self-Attention Aggregation (AAA) to extract latent dependencies over extended sequences. Additionally, LATTE incorporates the Flamingo Alert-Assisted System (FAA), leveraging a vision-language model to provide real-time, cognitively accessible verbal hazard alerts, improving passenger situational awareness. Evaluations on benchmark datasets (DAD, CCD, A3D) demonstrate LATTE's superior predictive capabilities and computational efficiency. LATTE achieves state-of-the-art 89.74% Average Precision (AP) on DAD benchmark, with 5.4% higher mean Time-To-Accident (mTTA) than the second-best model, and maintains competitive mTTA at a Recall of 80% (TTA@R80) (4.04s) while demonstrating robust accident anticipation across diverse driving conditions. Its lightweight design delivers a 93.14% reduction in floating-point operations (FLOPs) and a 31.58% decrease in parameter count (Params), enabling real-time operation on resource-limited hardware without compromising performance. Ablation studies confirm the effectiveness of LATTE's architectural components, while visualizations and failure case analyses highlight its practical applicability and areas for enhancement.

LATTE: Lightweight Attention-based Traffic Accident Anticipation Engine

TL;DR

LATTE addresses real-time traffic-accident anticipation under edge-computing constraints by integrating four synergistic components: Efficient Multiscale Spatial Aggregation (EMSA) for scalable spatial features, Memory Attention Aggregation (MAA) for memory-efficient temporal modeling, Auxiliary Self-Attention Aggregation (AAA) for extended temporal dependencies, and Flamingo Alert-Assisted System (FAA) for multilingual verbal hazard alerts. A probabilistic frame-level scorer, aided by Bayesian-style uncertainty, informs alerts while a two-tier training loss couples frame-level urgency with video-level semantic alignment. Empirical results on CCD, DAD, and A3D show state-of-the-art predictive performance on DAD (AP ≈ 89.7%), with substantial FLOPs and parameter reductions enabling real-time edge deployment, and robust ablations validating each architectural component. The framework demonstrates meaningful improvements in early warning windows and passenger situational awareness, suggesting practical impact for autonomous- and mixed-traffic safety systems, while outlining future directions for multimodal fusion and domain adaptation to address complex urban driving conditions.

Abstract

Accurately predicting traffic accidents in real-time is a critical challenge in autonomous driving, particularly in resource-constrained environments. Existing solutions often suffer from high computational overhead or fail to adequately address the uncertainty of evolving traffic scenarios. This paper introduces LATTE, a Lightweight Attention-based Traffic Accident Anticipation Engine, which integrates computational efficiency with state-of-the-art performance. LATTE employs Efficient Multiscale Spatial Aggregation (EMSA) to capture spatial features across scales, Memory Attention Aggregation (MAA) to enhance temporal modeling, and Auxiliary Self-Attention Aggregation (AAA) to extract latent dependencies over extended sequences. Additionally, LATTE incorporates the Flamingo Alert-Assisted System (FAA), leveraging a vision-language model to provide real-time, cognitively accessible verbal hazard alerts, improving passenger situational awareness. Evaluations on benchmark datasets (DAD, CCD, A3D) demonstrate LATTE's superior predictive capabilities and computational efficiency. LATTE achieves state-of-the-art 89.74% Average Precision (AP) on DAD benchmark, with 5.4% higher mean Time-To-Accident (mTTA) than the second-best model, and maintains competitive mTTA at a Recall of 80% (TTA@R80) (4.04s) while demonstrating robust accident anticipation across diverse driving conditions. Its lightweight design delivers a 93.14% reduction in floating-point operations (FLOPs) and a 31.58% decrease in parameter count (Params), enabling real-time operation on resource-limited hardware without compromising performance. Ablation studies confirm the effectiveness of LATTE's architectural components, while visualizations and failure case analyses highlight its practical applicability and areas for enhancement.

Paper Structure

This paper contains 20 sections, 10 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Overall framework architecture of LATTE. Firstly, The vehicle detection and feature extraction simultaneously capture object-level bounding boxes and object/frame-level features. These heterogeneous features are concatenated to form a multi-scale input tensor. The output is then fed into Efficient Multiscale Spatial Aggregation module, Memory Attention Aggregation module and Auxiliary Self-Attention Aggregation module for more precise spatial and temporal features. The refined features are fused to derive calibrated accident probability scores. Finally, the Flamingo Alert-Assisted System synthesizes and interprets these computational outputs to produce contextually natural language alerts in real time.
  • Figure 2: Annotation Statistics of CCD Dataset. The histogram emphasizes environmental condition variability (weather patterns and illumination states), ego-vehicle engagement dynamics, and scenario complexity across 4,500 annotated clips. The stratified training-test partition (4:1 ratio) ensures robust evaluation of accident anticipation systems, which enables precise modeling of traffic interactions across heterogeneous driving contexts, significantly advancing proactive accident anticipation system development through scenario-aware learning paradigms.
  • Figure 3: Visualization of multi-category accident instances in the DAD dataset, showcasing:Diverse detected traffic participants (marked by the yellow box) and accident types; (b) Scenario variations encompassing meteorological conditions (rain/snow/fog), illumination levels (daytime/night), and perspective configurations.
  • Figure 4: Anticipation of an accident-positive scenario. LATTE predicts the accident 3.7 seconds prior to its occurrence, with green bounding boxes denote the unrelated-accident objects, yellow bounding boxes mark the accident-related objects and orange bounding boxes highlight the accident participants at the actual moment of accident occurrence. The probability plot shows the prediction surpassing the 0.5 threshold, supported by FAA’s verbal alert.
  • Figure 5: Anticipation of an accident-negative scenario. LATTE correctly maintains a low accident probability as the main vehicle navigates safely. Peaks in predictions around frames 20 and 70 are attributed to the proximity of a delivery truck but resolve as the risk decreases.
  • ...and 1 more figures