Conditioning Latent-Space Clusters for Real-World Anomaly Classification
Daniel Bogdoll, Svetlana Pavlitska, Simon Klaus, J. Marius Zöllner
TL;DR
This work addresses real-world anomaly detection in autonomous driving using high-resolution urban images by conditioning the latent space of a variational autoencoder (CL-VAE) to form two separable clusters for normal and anomalous data. It introduces a discrepancy-image channel derived from semantic segmentation and image resynthesis to emphasize small anomalies, along with two auxiliary loss terms, including $L_{distance}$ and $L_i$, to reinforce cluster separation while preserving reconstruction quality via a perceptual loss. Empirical results show that latent-space clustering and the discrepancy input can improve anomaly detection, with reconstruction quality enhanced by larger latent spaces and lower KL weight, and the frequency-based label replacement mitigating bias in the discrepancy module; however, the method still yields notable false positives, suggesting utility for active-learning pipelines. Overall, the approach offers a practical avenue to pre-filter candidate anomalies for human review and further refinement in safety-critical, real-world driving scenarios.
Abstract
Anomalies in the domain of autonomous driving are a major hindrance to the large-scale deployment of autonomous vehicles. In this work, we focus on high-resolution camera data from urban scenes that include anomalies of various types and sizes. Based on a Variational Autoencoder, we condition its latent space to classify samples as either normal data or anomalies. In order to emphasize especially small anomalies, we perform experiments where we provide the VAE with a discrepancy map as an additional input, evaluating its impact on the detection performance. Our method separates normal data and anomalies into isolated clusters while still reconstructing high-quality images, leading to meaningful latent representations.
