Table of Contents
Fetching ...

Conditioning Latent-Space Clusters for Real-World Anomaly Classification

Daniel Bogdoll, Svetlana Pavlitska, Simon Klaus, J. Marius Zöllner

TL;DR

This work addresses real-world anomaly detection in autonomous driving using high-resolution urban images by conditioning the latent space of a variational autoencoder (CL-VAE) to form two separable clusters for normal and anomalous data. It introduces a discrepancy-image channel derived from semantic segmentation and image resynthesis to emphasize small anomalies, along with two auxiliary loss terms, including $L_{distance}$ and $L_i$, to reinforce cluster separation while preserving reconstruction quality via a perceptual loss. Empirical results show that latent-space clustering and the discrepancy input can improve anomaly detection, with reconstruction quality enhanced by larger latent spaces and lower KL weight, and the frequency-based label replacement mitigating bias in the discrepancy module; however, the method still yields notable false positives, suggesting utility for active-learning pipelines. Overall, the approach offers a practical avenue to pre-filter candidate anomalies for human review and further refinement in safety-critical, real-world driving scenarios.

Abstract

Anomalies in the domain of autonomous driving are a major hindrance to the large-scale deployment of autonomous vehicles. In this work, we focus on high-resolution camera data from urban scenes that include anomalies of various types and sizes. Based on a Variational Autoencoder, we condition its latent space to classify samples as either normal data or anomalies. In order to emphasize especially small anomalies, we perform experiments where we provide the VAE with a discrepancy map as an additional input, evaluating its impact on the detection performance. Our method separates normal data and anomalies into isolated clusters while still reconstructing high-quality images, leading to meaningful latent representations.

Conditioning Latent-Space Clusters for Real-World Anomaly Classification

TL;DR

This work addresses real-world anomaly detection in autonomous driving using high-resolution urban images by conditioning the latent space of a variational autoencoder (CL-VAE) to form two separable clusters for normal and anomalous data. It introduces a discrepancy-image channel derived from semantic segmentation and image resynthesis to emphasize small anomalies, along with two auxiliary loss terms, including and , to reinforce cluster separation while preserving reconstruction quality via a perceptual loss. Empirical results show that latent-space clustering and the discrepancy input can improve anomaly detection, with reconstruction quality enhanced by larger latent spaces and lower KL weight, and the frequency-based label replacement mitigating bias in the discrepancy module; however, the method still yields notable false positives, suggesting utility for active-learning pipelines. Overall, the approach offers a practical avenue to pre-filter candidate anomalies for human review and further refinement in safety-critical, real-world driving scenarios.

Abstract

Anomalies in the domain of autonomous driving are a major hindrance to the large-scale deployment of autonomous vehicles. In this work, we focus on high-resolution camera data from urban scenes that include anomalies of various types and sizes. Based on a Variational Autoencoder, we condition its latent space to classify samples as either normal data or anomalies. In order to emphasize especially small anomalies, we perform experiments where we provide the VAE with a discrepancy map as an additional input, evaluating its impact on the detection performance. Our method separates normal data and anomalies into isolated clusters while still reconstructing high-quality images, leading to meaningful latent representations.
Paper Structure (14 sections, 2 equations, 9 figures, 1 table)

This paper contains 14 sections, 2 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Our VAE-based method for real-world anomaly classification, which separates normal and abnormal data in its latent space. Discrepancy images as additional inputs also emphasize small unknown objects, here a cat.
  • Figure 2: We used the Cityscapes cordts_cityscapes_2016 and Fishyscapes blum_fishyscapes_2021 (normal) datasets as normality (left) and the RoadAnomaly21 chan_segmentmeifyoucan_2021, Fishyscapes (anomalies), and Lost and Found pinggera_lost_2016 datasets with anomalies (right). Reprinted from Klaus_Anomaly_2022_BA.
  • Figure 3: Overall architecture of the deployed VAE (left) and the components of the ResBlock (right). Adapted from Klaus_Anomaly_2022_BA
  • Figure 4: Discrepancy images for a Cityscapes image containing an object of the rare but normal class bus. The original approach by Lis et al. lis_detecting_2019 (middle) leads to higher anomaly scores. The proposed frequency-based approach (right) leads to lower anomaly scores. Reprinted from Klaus_Anomaly_2022_BA.
  • Figure 5: Distribution of mean anomaly scores in the discrepancy maps generated for the Cityscapes test set, comparing the original approach by Lis et al. lis_detecting_2019 (blue) to our frequency-based label replacement (orange). Reprinted from Klaus_Anomaly_2022_BA.
  • ...and 4 more figures