Hybrid Video Anomaly Detection for Anomalous Scenarios in Autonomous Driving
Daniel Bogdoll, Jan Imhof, Tim Joseph, Svetlana Pavlitska, J. Marius Zöllner
TL;DR
This work tackles the challenge of detecting temporal anomalies in autonomous driving by extending the HF$^2$-VAD framework to ego-vehicle video data, enabling dense pixel-wise anomaly localization. The approach combines ML-MemAE-SC for optical-flow reconstruction and a CVAE for future-frame prediction, operating on ego-centric flows and vehicle bounding boxes to learn normal driving from CARLA simulations. The frame-wise anomaly score $S_f$ fuses a flow-reconstruction error $S_r$ and a future-frame prediction error $S_p$ via $S_f = w_r \\cdot \\frac{S_r - \\mu_r}{\\sigma_r} + w_p \\cdot \\frac{S_p - \\mu_p}{\\sigma_p}$, while a pixel-wise score $S_{ ext{pixel}}$ aggregates robustly scaled per-pixel MSE within bounding boxes to localize anomalies, yielding dense anomaly maps. Evaluated on AnoVox with sudden braking scenarios, the method demonstrates effective detection and localization under varying weather and traffic conditions, though it remains sensitive to object-detection reliability. This work advances practical VAD for autonomous driving by delivering high-resolution, localized anomaly insights and highlighting avenues for improving robustness in real-world deployments.
Abstract
In autonomous driving, the most challenging scenarios can only be detected within their temporal context. Most video anomaly detection approaches focus either on surveillance or traffic accidents, which are only a subfield of autonomous driving. We present HF$^2$-VAD$_{AD}$, a variation of the HF$^2$-VAD surveillance video anomaly detection method for autonomous driving. We learn a representation of normality from a vehicle's ego perspective and evaluate pixel-wise anomaly detections in rare and critical scenarios.
