Table of Contents
Fetching ...

Hybrid Video Anomaly Detection for Anomalous Scenarios in Autonomous Driving

Daniel Bogdoll, Jan Imhof, Tim Joseph, Svetlana Pavlitska, J. Marius Zöllner

TL;DR

This work tackles the challenge of detecting temporal anomalies in autonomous driving by extending the HF$^2$-VAD framework to ego-vehicle video data, enabling dense pixel-wise anomaly localization. The approach combines ML-MemAE-SC for optical-flow reconstruction and a CVAE for future-frame prediction, operating on ego-centric flows and vehicle bounding boxes to learn normal driving from CARLA simulations. The frame-wise anomaly score $S_f$ fuses a flow-reconstruction error $S_r$ and a future-frame prediction error $S_p$ via $S_f = w_r \\cdot \\frac{S_r - \\mu_r}{\\sigma_r} + w_p \\cdot \\frac{S_p - \\mu_p}{\\sigma_p}$, while a pixel-wise score $S_{ ext{pixel}}$ aggregates robustly scaled per-pixel MSE within bounding boxes to localize anomalies, yielding dense anomaly maps. Evaluated on AnoVox with sudden braking scenarios, the method demonstrates effective detection and localization under varying weather and traffic conditions, though it remains sensitive to object-detection reliability. This work advances practical VAD for autonomous driving by delivering high-resolution, localized anomaly insights and highlighting avenues for improving robustness in real-world deployments.

Abstract

In autonomous driving, the most challenging scenarios can only be detected within their temporal context. Most video anomaly detection approaches focus either on surveillance or traffic accidents, which are only a subfield of autonomous driving. We present HF$^2$-VAD$_{AD}$, a variation of the HF$^2$-VAD surveillance video anomaly detection method for autonomous driving. We learn a representation of normality from a vehicle's ego perspective and evaluate pixel-wise anomaly detections in rare and critical scenarios.

Hybrid Video Anomaly Detection for Anomalous Scenarios in Autonomous Driving

TL;DR

This work tackles the challenge of detecting temporal anomalies in autonomous driving by extending the HF-VAD framework to ego-vehicle video data, enabling dense pixel-wise anomaly localization. The approach combines ML-MemAE-SC for optical-flow reconstruction and a CVAE for future-frame prediction, operating on ego-centric flows and vehicle bounding boxes to learn normal driving from CARLA simulations. The frame-wise anomaly score fuses a flow-reconstruction error and a future-frame prediction error via , while a pixel-wise score aggregates robustly scaled per-pixel MSE within bounding boxes to localize anomalies, yielding dense anomaly maps. Evaluated on AnoVox with sudden braking scenarios, the method demonstrates effective detection and localization under varying weather and traffic conditions, though it remains sensitive to object-detection reliability. This work advances practical VAD for autonomous driving by delivering high-resolution, localized anomaly insights and highlighting avenues for improving robustness in real-world deployments.

Abstract

In autonomous driving, the most challenging scenarios can only be detected within their temporal context. Most video anomaly detection approaches focus either on surveillance or traffic accidents, which are only a subfield of autonomous driving. We present HF-VAD, a variation of the HF-VAD surveillance video anomaly detection method for autonomous driving. We learn a representation of normality from a vehicle's ego perspective and evaluate pixel-wise anomaly detections in rare and critical scenarios.
Paper Structure (7 sections, 4 equations, 7 figures, 3 tables)

This paper contains 7 sections, 4 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Exemplary anomaly detection with HF$^2$-VAD$_{AD}$. The vehicle in front performs a sudden braking maneuver, as highlighted in yellow in the top-left graph. The graph shows the anomaly scores per frame, peaking while the vehicle performs the maneuver. The right frame shows pixel-wise anomaly scores for all instances of the class vehicle, computing anomaly scores only within class-specific bounding boxes predicted by an object detector.
  • Figure 2: HF$^2$-VAD$_{AD}$: Adaptation and extension of HF$^2$-VAD for autonomous driving. Optical flows ${y}_{1:t}$ and bounding boxes ${x}_{1:t}$ for relevant objects are generated for each input image. ML-MemAE-SC reconstructs the optical flows $\hat{y}_{1:t}$ with memory-modules $M$ to better reconstruct only normal patterns. The CVAE predicts a future frame $\hat{x}_{t+1}$. On this basis, image-wise and localized pixel-wise anomaly scores are generated. Adapted from liu_hybrid_2021.
  • Figure 3: Sequential Anomaly Detection: The first image shows the normal driving mode of the lead vehicle, while the last two images show an anomaly with the pixel-wise ground truth overlayed in red. Other regularly driving traffic participants are also present. As visible in the anomaly maps, HF$^2$-VAD$_{AD}$ successfully detects the unknown maneuver.
  • Figure 4: Distribution of braking behaviors of other traffic participants in training (blue) and test (red) data. High-intensity sudden braking scenarios, labeled as anomalous, only occur in the evaluation dataset.
  • Figure 5: Comparison of ROC curves for the setting highway with rain. The x marks the FPR$_{95}$ position. On the left, a low FPR$_{95}$ is achieved with predicted bounding boxes. On the right, the FPR$_{95}$ rose sharply with ground truth bounding boxes.
  • ...and 2 more figures