UMAD: Unsupervised Mask-Level Anomaly Detection for Autonomous Driving
Daniel Bogdoll, Noël Ollick, Tim Joseph, Svetlana Pavlitska, J. Marius Zöllner
TL;DR
This work presents UMAD, the first fully unsupervised mask-level anomaly detection method for autonomous driving, combining a multimodal world model (MUVO) with unsupervised image segmentation (U2Seg) to detect anomalies without exposure to outliers. The approach computes pixel- and mask-level anomaly scores using diverse difference maps, including visual, perceptual, and temporal cues, and refines these scores through segmentation masks. On the AnoVox benchmark, UMAD demonstrates substantial improvements over a state-of-the-art unsupervised baseline, achieving a notable reduction in false positives at high true-positive rates and establishing a new baseline for unsupervised anomaly detection in driving scenarios. The work also conducts extensive ablations, highlighting the value of mask-based refinement and the impact of segmentation choices, while acknowledging limitations related to reconstruction quality and domain shifts in unsupervised segmentation.
Abstract
Dealing with atypical traffic scenarios remains a challenging task in autonomous driving. However, most anomaly detection approaches cannot be trained on raw sensor data but require exposure to outlier data and powerful semantic segmentation models trained in a supervised fashion. This limits the representation of normality to labeled data, which does not scale well. In this work, we revisit unsupervised anomaly detection and present UMAD, leveraging generative world models and unsupervised image segmentation. Our method outperforms state-of-the-art unsupervised anomaly detection.
