Table of Contents
Fetching ...

Addressing Out-of-Label Hazard Detection in Dashcam Videos: Insights from the COOOL Challenge

Anh-Kiet Duong, Petra Gomez-Krämer

TL;DR

The paper tackles hazard analysis in dashcam footage under the out-of-label COOOL benchmark, focusing on driver reactions, hazardous objects, and descriptive captions. It introduces a three-part pipeline that couples unsupervised speed and audio anomaly detection for driver reactions, a privacy-aware ensemble of heuristic rules for hazard detection, and vision-language models (BLIPv2/BLIP/CLIP) for captioning. The approach achieves top performance on both public and private COOOL leaderboards, demonstrating robustness to sparse labels and the value of integrating anomaly detection, privacy-preserving fusion, and captioning in real-time hazard analysis. This work provides a practical framework for robust hazard understanding in autonomous driving contexts, with potential for real-world deployment under limited supervision.

Abstract

This paper presents a novel approach for hazard analysis in dashcam footage, addressing the detection of driver reactions to hazards, the identification of hazardous objects, and the generation of descriptive captions. We first introduce a method for detecting driver reactions through speed and sound anomaly detection, leveraging unsupervised learning techniques. For hazard detection, we employ a set of heuristic rules as weak classifiers, which are combined using an ensemble method. This ensemble approach is further refined with differential privacy to mitigate overconfidence, ensuring robustness despite the lack of labeled data. Lastly, we use state-of-the-art vision-language models for hazard captioning, generating descriptive labels for the detected hazards. Our method achieved the highest scores in the Challenge on Out-of-Label in Autonomous Driving, demonstrating its effectiveness across all three tasks. Source codes are publicly available at https://github.com/ffyyytt/COOOL_2025.

Addressing Out-of-Label Hazard Detection in Dashcam Videos: Insights from the COOOL Challenge

TL;DR

The paper tackles hazard analysis in dashcam footage under the out-of-label COOOL benchmark, focusing on driver reactions, hazardous objects, and descriptive captions. It introduces a three-part pipeline that couples unsupervised speed and audio anomaly detection for driver reactions, a privacy-aware ensemble of heuristic rules for hazard detection, and vision-language models (BLIPv2/BLIP/CLIP) for captioning. The approach achieves top performance on both public and private COOOL leaderboards, demonstrating robustness to sparse labels and the value of integrating anomaly detection, privacy-preserving fusion, and captioning in real-time hazard analysis. This work provides a practical framework for robust hazard understanding in autonomous driving contexts, with potential for real-world deployment under limited supervision.

Abstract

This paper presents a novel approach for hazard analysis in dashcam footage, addressing the detection of driver reactions to hazards, the identification of hazardous objects, and the generation of descriptive captions. We first introduce a method for detecting driver reactions through speed and sound anomaly detection, leveraging unsupervised learning techniques. For hazard detection, we employ a set of heuristic rules as weak classifiers, which are combined using an ensemble method. This ensemble approach is further refined with differential privacy to mitigate overconfidence, ensuring robustness despite the lack of labeled data. Lastly, we use state-of-the-art vision-language models for hazard captioning, generating descriptive labels for the detected hazards. Our method achieved the highest scores in the Challenge on Out-of-Label in Autonomous Driving, demonstrating its effectiveness across all three tasks. Source codes are publicly available at https://github.com/ffyyytt/COOOL_2025.

Paper Structure

This paper contains 11 sections, 1 figure, 1 table, 2 algorithms.

Figures (1)

  • Figure 1: Sample frames from some videos of the COOOL dataset alshami2024coool. The red bounding box denotes the challenge_object, while the blue bounding box represents the traffic_scene as labeled in the annotations_public file. The number of each bounding box corresponds to the tracking ID of the respective object.