LATTE: Lightweight Attention-based Traffic Accident Anticipation Engine
Jiaxun Zhang, Yanchen Guan, Chengyue Wang, Haicheng Liao, Guohui Zhang, Zhenning Li
TL;DR
LATTE addresses real-time traffic-accident anticipation under edge-computing constraints by integrating four synergistic components: Efficient Multiscale Spatial Aggregation (EMSA) for scalable spatial features, Memory Attention Aggregation (MAA) for memory-efficient temporal modeling, Auxiliary Self-Attention Aggregation (AAA) for extended temporal dependencies, and Flamingo Alert-Assisted System (FAA) for multilingual verbal hazard alerts. A probabilistic frame-level scorer, aided by Bayesian-style uncertainty, informs alerts while a two-tier training loss couples frame-level urgency with video-level semantic alignment. Empirical results on CCD, DAD, and A3D show state-of-the-art predictive performance on DAD (AP ≈ 89.7%), with substantial FLOPs and parameter reductions enabling real-time edge deployment, and robust ablations validating each architectural component. The framework demonstrates meaningful improvements in early warning windows and passenger situational awareness, suggesting practical impact for autonomous- and mixed-traffic safety systems, while outlining future directions for multimodal fusion and domain adaptation to address complex urban driving conditions.
Abstract
Accurately predicting traffic accidents in real-time is a critical challenge in autonomous driving, particularly in resource-constrained environments. Existing solutions often suffer from high computational overhead or fail to adequately address the uncertainty of evolving traffic scenarios. This paper introduces LATTE, a Lightweight Attention-based Traffic Accident Anticipation Engine, which integrates computational efficiency with state-of-the-art performance. LATTE employs Efficient Multiscale Spatial Aggregation (EMSA) to capture spatial features across scales, Memory Attention Aggregation (MAA) to enhance temporal modeling, and Auxiliary Self-Attention Aggregation (AAA) to extract latent dependencies over extended sequences. Additionally, LATTE incorporates the Flamingo Alert-Assisted System (FAA), leveraging a vision-language model to provide real-time, cognitively accessible verbal hazard alerts, improving passenger situational awareness. Evaluations on benchmark datasets (DAD, CCD, A3D) demonstrate LATTE's superior predictive capabilities and computational efficiency. LATTE achieves state-of-the-art 89.74% Average Precision (AP) on DAD benchmark, with 5.4% higher mean Time-To-Accident (mTTA) than the second-best model, and maintains competitive mTTA at a Recall of 80% (TTA@R80) (4.04s) while demonstrating robust accident anticipation across diverse driving conditions. Its lightweight design delivers a 93.14% reduction in floating-point operations (FLOPs) and a 31.58% decrease in parameter count (Params), enabling real-time operation on resource-limited hardware without compromising performance. Ablation studies confirm the effectiveness of LATTE's architectural components, while visualizations and failure case analyses highlight its practical applicability and areas for enhancement.
