Table of Contents
Fetching ...

FlowCLAS: Enhancing Normalizing Flow Via Contrastive Learning For Anomaly Segmentation

Chang Won Lee, Selina Leveugle, Svetlana Stolpner, Chris Langley, Paul Grouchy, Jonathan Kelly, Steven L. Waslander

TL;DR

FlowCLAS addresses scene-level anomaly segmentation under limited labeled data by fusing frozen vision foundation model features with a normalizing flow that models feature density. It introduces Outlier Exposure via pseudo-outliers and latent-space contrastive learning to explicitly separate inliers and outliers in the flow latent space, enabling probabilistic anomaly maps without pixel-level labels. The method achieves state-of-the-art results on ALLO space anomaly segmentation and competitive performance on road anomaly benchmarks, highlighting strong cross-domain generalization and interpretability through exact likelihood maps. Overall, FlowCLAS offers a scalable, domain-agnostic approach that leverages rich pre-trained features while avoiding costly fine-tuning, making it suitable for safety-critical robotics applications.

Abstract

Anomaly segmentation is a valuable computer vision task for safety-critical applications that need to be aware of unexpected events. Current state-of-the-art (SOTA) scene-level anomaly segmentation approaches rely on diverse inlier class labels during training, limiting their ability to leverage vast unlabeled datasets and pre-trained vision encoders. These methods may underperform in domains with reduced color diversity and limited object classes. Conversely, existing unsupervised methods struggle with anomaly segmentation with the diverse scenes of less restricted domains. To address these challenges, we introduce FlowCLAS, a novel self-supervised framework that utilizes vision foundation models to extract rich features and employs a normalizing flow network to learn their density distribution. We enhance the model's discriminative power by incorporating Outlier Exposure and contrastive learning in the latent space. FlowCLAS significantly outperforms all existing methods on the ALLO anomaly segmentation benchmark for space robotics and demonstrates competitive results on multiple road anomaly segmentation benchmarks for autonomous driving, including Fishyscapes Lost&Found and Road Anomaly. These results highlight FlowCLAS's effectiveness in addressing the unique challenges of space anomaly segmentation while retaining SOTA performance in the autonomous driving domain without reliance on inlier segmentation labels.

FlowCLAS: Enhancing Normalizing Flow Via Contrastive Learning For Anomaly Segmentation

TL;DR

FlowCLAS addresses scene-level anomaly segmentation under limited labeled data by fusing frozen vision foundation model features with a normalizing flow that models feature density. It introduces Outlier Exposure via pseudo-outliers and latent-space contrastive learning to explicitly separate inliers and outliers in the flow latent space, enabling probabilistic anomaly maps without pixel-level labels. The method achieves state-of-the-art results on ALLO space anomaly segmentation and competitive performance on road anomaly benchmarks, highlighting strong cross-domain generalization and interpretability through exact likelihood maps. Overall, FlowCLAS offers a scalable, domain-agnostic approach that leverages rich pre-trained features while avoiding costly fine-tuning, making it suitable for safety-critical robotics applications.

Abstract

Anomaly segmentation is a valuable computer vision task for safety-critical applications that need to be aware of unexpected events. Current state-of-the-art (SOTA) scene-level anomaly segmentation approaches rely on diverse inlier class labels during training, limiting their ability to leverage vast unlabeled datasets and pre-trained vision encoders. These methods may underperform in domains with reduced color diversity and limited object classes. Conversely, existing unsupervised methods struggle with anomaly segmentation with the diverse scenes of less restricted domains. To address these challenges, we introduce FlowCLAS, a novel self-supervised framework that utilizes vision foundation models to extract rich features and employs a normalizing flow network to learn their density distribution. We enhance the model's discriminative power by incorporating Outlier Exposure and contrastive learning in the latent space. FlowCLAS significantly outperforms all existing methods on the ALLO anomaly segmentation benchmark for space robotics and demonstrates competitive results on multiple road anomaly segmentation benchmarks for autonomous driving, including Fishyscapes Lost&Found and Road Anomaly. These results highlight FlowCLAS's effectiveness in addressing the unique challenges of space anomaly segmentation while retaining SOTA performance in the autonomous driving domain without reliance on inlier segmentation labels.

Paper Structure

This paper contains 25 sections, 9 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Anomaly heatmaps generated by FastFlow yuFastFlowUnsupervisedAnomaly2021, UNO delicOutlierDetectionEnsembling2024, and our approach for an ALLO leveugleALLOPhotorealisticDataset2024 test image containing an anomalous object. Our method outperforms both unsupervised and supervised approaches in the presence of limited pixel and label diversity.
  • Figure 2: Given a mixed image $\mathbf{x}^{mix}$ and an outlier image $\mathbf{x}^{out}$, FlowCLAS extracts 2D feature maps using a frozen vision foundation model $f_{\phi}$. These features are then projected into a latent space by a normalizing flow network $\mathbf{f}_{\theta}$, comprising a series of invertible function blocks. The resulting outputs are samples from a Multivariate Gaussian distribution $\mathbf{z} \sim \mathcal{N}(\mathbf{\mu}, \mathbf{\Sigma})$. Parameters $\theta$ are optimized to maximize the likelihood of the latent samples corresponding to normal inputs $\mathbf{z}_{normal}^{mix}$, as determined by the binary mask $\mathbf{y}^{mix}$. Additionally, the latent samples $\mathbf{z}^{\{mix,out\}}$ are projected into a lower-dimensional space, where contrastive learning encourages inter-class dissimilarity and intra-class similarity. During inference, FlowCLAS generates latent maps for a given image, which are then used to compute a likelihood-based anomaly score map.
  • Figure 3: Predicted heatmaps (top) and anomaly score histograms (bottom) from FastFlow yuFastFlowUnsupervisedAnomaly2021, UNO delicOutlierDetectionEnsembling2024, and FlowCLAS for a challenging example from the ALLO test set, where the small anomalous object is subtly integrated within the scene.
  • Figure 4: Semantic segmentation performance of anomaly class on ALLO test set using linear probing. Three backbones are compared: ImageNet-1k pre-trained dengImageNetLargescaleHierarchical2009a (orange), fine-tuned with foreground-background labels (green), and DINOv2-B oquabDINOv2LearningRobust2023 (blue). Each pair uses identical architecture with frozen weights. A notable performance decline is observed when using the foreground-background fine-tuned backbone compared to its pre-trained counterpart.