Language-Guided Open-World Anomaly Segmentation
Klara Reichard, Nikolas Brasch, Nassir Navab, Federico Tombari
TL;DR
Clipomaly introduces a zero-shot, CLIP-based framework for open-world anomaly segmentation in autonomous driving, enabling simultaneous segmentation of known objects and semantically meaningful labeling of unknown regions without anomaly-specific training. The method predicts unknown regions via dense CLIP embeddings, generates candidate labels through RAM or dictionary preselection, and matches regions to labels with region-aware CLIP scoring, finally performing open-vocabulary segmentation with an extended vocabulary. It achieves state-of-the-art anomaly segmentation on benchmarks like RoadAnomaly and SMIYC AnomalyTrack, while preserving accuracy on known classes and providing interpretable, dynamically extendable vocabulary at inference. The approach offers practical benefits for deployment by avoiding continual retraining and delivering human-readable anomaly names that can aid downstream planning and control.
Abstract
Open-world and anomaly segmentation methods seek to enable autonomous driving systems to detect and segment both known and unknown objects in real-world scenes. However, existing methods do not assign semantically meaningful labels to unknown regions, and distinguishing and learning representations for unknown classes remains difficult. While open-vocabulary segmentation methods show promise in generalizing to novel classes, they require a fixed inference vocabulary and thus cannot be directly applied to anomaly segmentation where unknown classes are unconstrained. We propose Clipomaly, the first CLIP-based open-world and anomaly segmentation method for autonomous driving. Our zero-shot approach requires no anomaly-specific training data and leverages CLIP's shared image-text embedding space to both segment unknown objects and assign human-interpretable names to them. Unlike open-vocabulary methods, our model dynamically extends its vocabulary at inference time without retraining, enabling robust detection and naming of anomalies beyond common class definitions such as those in Cityscapes. Clipomaly achieves state-of-the-art performance on established anomaly segmentation benchmarks while providing interpretability and flexibility essential for practical deployment.
