Table of Contents
Fetching ...

UniSpector: Towards Universal Open-set Defect Recognition via Spectral-Contrastive Visual Prompting

Geonuk Kim, Minhoi Kim, Kangil Lee, Minsu Kim, Hyeonseong Jeon, Jeonghoon Han, Hyoungjoon Lim, Junho Yim

Abstract

Although industrial inspection systems should be capable of recognizing unprecedented defects, most existing approaches operate under a closed-set assumption, which prevents them from detecting novel anomalies. While visual prompting offers a scalable alternative for industrial inspection, existing methods often suffer from prompt embedding collapse due to high intra-class variance and subtle inter-class differences. To resolve this, we propose UniSpector, which shifts the focus from naive prompt-to-region matching to the principled design of a semantically structured and transferable prompt topology. UniSpector employs the Spatial-Spectral Prompt Encoder to extract orientation-invariant, fine-grained representations; these serve as a solid basis for the Contrastive Prompt Encoder to explicitly regularize the prompt space into a semantically organized angular manifold. Additionally, Prompt-guided Query Selection generates adaptive object queries aligned with the prompt. We introduce Inspect Anything, the first benchmark for visual-prompt-based open-set defect localization, where UniSpector significantly outperforms baselines by at least 19.7% and 15.8% in AP50b and AP50m, respectively. These results show that our method enable a scalable, retraining-free inspection paradigm for continuously evolving industrial environments, while offering critical insights into the design of generic visual prompting.

UniSpector: Towards Universal Open-set Defect Recognition via Spectral-Contrastive Visual Prompting

Abstract

Although industrial inspection systems should be capable of recognizing unprecedented defects, most existing approaches operate under a closed-set assumption, which prevents them from detecting novel anomalies. While visual prompting offers a scalable alternative for industrial inspection, existing methods often suffer from prompt embedding collapse due to high intra-class variance and subtle inter-class differences. To resolve this, we propose UniSpector, which shifts the focus from naive prompt-to-region matching to the principled design of a semantically structured and transferable prompt topology. UniSpector employs the Spatial-Spectral Prompt Encoder to extract orientation-invariant, fine-grained representations; these serve as a solid basis for the Contrastive Prompt Encoder to explicitly regularize the prompt space into a semantically organized angular manifold. Additionally, Prompt-guided Query Selection generates adaptive object queries aligned with the prompt. We introduce Inspect Anything, the first benchmark for visual-prompt-based open-set defect localization, where UniSpector significantly outperforms baselines by at least 19.7% and 15.8% in AP50b and AP50m, respectively. These results show that our method enable a scalable, retraining-free inspection paradigm for continuously evolving industrial environments, while offering critical insights into the design of generic visual prompting.

Paper Structure

This paper contains 30 sections, 8 equations, 10 figures, 8 tables, 1 algorithm.

Figures (10)

  • Figure 1: Comparison of visual inspection paradigms: (a) closed-set detectors fail on novel defect types and require costly retraining, (b) anomaly detectors cannot distinguish between defect classes, and (c) visual prompting enables open-set recognition by aligning unseen defects with exemplar prompts, providing a scalable visual inspection framework.
  • Figure 2: Examples from the InsA benchmark. Top: Samples from the same defect class showing high intra-class appearance variance. Bottom: Samples from different classes exhibiting similar visual patterns, resulting in low inter-class separability. Such ambiguities highlight the inherent difficulty of defect recognition.
  • Figure 3: Overview of UniSpector, an open-set defect detection and segmentation framework. The Spatial–Spectral Prompt Encoder extracts orientation-invariant spectral cues fused with spatial features to distinguish visually similar defects. Building on these, Contrastive Prompt Encoding regularizes the prompt embedding space into a structured manifold for robust open-set generalization. A Prompt-guided Query Selection mechanism dynamically selects prompt-aware queries. Refer to Appendix \ref{['supp:section31']} for the inference phase architecture.
  • Figure 4: 3D PCA projection of L2-normalized prompt embeddings learned by UniSpector (left) and DINOv (right). Circle markers ($\circ$) denote seen sets and cross markers ($\times$) denote unseen sets.
  • Figure 5: Intra-class cosine similarity comparison across seen and unseen defect classes on RealIAD.
  • ...and 5 more figures