Addressing Out-of-Label Hazard Detection in Dashcam Videos: Insights from the COOOL Challenge
Anh-Kiet Duong, Petra Gomez-Krämer
TL;DR
The paper tackles hazard analysis in dashcam footage under the out-of-label COOOL benchmark, focusing on driver reactions, hazardous objects, and descriptive captions. It introduces a three-part pipeline that couples unsupervised speed and audio anomaly detection for driver reactions, a privacy-aware ensemble of heuristic rules for hazard detection, and vision-language models (BLIPv2/BLIP/CLIP) for captioning. The approach achieves top performance on both public and private COOOL leaderboards, demonstrating robustness to sparse labels and the value of integrating anomaly detection, privacy-preserving fusion, and captioning in real-time hazard analysis. This work provides a practical framework for robust hazard understanding in autonomous driving contexts, with potential for real-world deployment under limited supervision.
Abstract
This paper presents a novel approach for hazard analysis in dashcam footage, addressing the detection of driver reactions to hazards, the identification of hazardous objects, and the generation of descriptive captions. We first introduce a method for detecting driver reactions through speed and sound anomaly detection, leveraging unsupervised learning techniques. For hazard detection, we employ a set of heuristic rules as weak classifiers, which are combined using an ensemble method. This ensemble approach is further refined with differential privacy to mitigate overconfidence, ensuring robustness despite the lack of labeled data. Lastly, we use state-of-the-art vision-language models for hazard captioning, generating descriptive labels for the detected hazards. Our method achieved the highest scores in the Challenge on Out-of-Label in Autonomous Driving, demonstrating its effectiveness across all three tasks. Source codes are publicly available at https://github.com/ffyyytt/COOOL_2025.
