Table of Contents
Fetching ...

Zero-Shot Anomaly Detection with Dual-Branch Prompt Selection

Zihan Wang, Samira Ebrahimi Kahou, Narges Armanfard

TL;DR

This work addresses zero-shot anomaly detection under domain shift by introducing PILOT, a dual-branch prompt learning framework that combines a learnable prompt pool with a fixed attribute memory bank and a label-free test-time adaptation strategy. By fusing adaptive prompts and semantic anchors through orthogonal projection and selectively updating prompts based on high-confidence pseudo-labels, PILOT achieves state-of-the-art anomaly detection and localization across 13 industrial and medical benchmarks without target-domain labels. The approach preserves CLIP's strengths while mitigating overfitting and distributional gaps, enabling robust performance under challenging cross-domain conditions. The results highlight the practical impact of zero-shot methods with targeted test-time adaptation for safety-critical inspection and diagnostics, reducing the need for labeled anomalies in deployment scenarios.

Abstract

Zero-shot anomaly detection (ZSAD) enables identifying and localizing defects in unseen categories by relying solely on generalizable features rather than requiring any labeled examples of anomalies. However, existing ZSAD methods, whether using fixed or learned prompts, struggle under domain shifts because their training data are derived from limited training domains and fail to generalize to new distributions. In this paper, we introduce PILOT, a framework designed to overcome these challenges through two key innovations: (1) a novel dual-branch prompt learning mechanism that dynamically integrates a pool of learnable prompts with structured semantic attributes, enabling the model to adaptively weight the most relevant anomaly cues for each input image; and (2) a label-free test-time adaptation strategy that updates the learnable prompt parameters using high-confidence pseudo-labels from unlabeled test data. Extensive experiments on 13 industrial and medical benchmarks demonstrate that PILOT achieves state-of-the-art performance in both anomaly detection and localization under domain shift.

Zero-Shot Anomaly Detection with Dual-Branch Prompt Selection

TL;DR

This work addresses zero-shot anomaly detection under domain shift by introducing PILOT, a dual-branch prompt learning framework that combines a learnable prompt pool with a fixed attribute memory bank and a label-free test-time adaptation strategy. By fusing adaptive prompts and semantic anchors through orthogonal projection and selectively updating prompts based on high-confidence pseudo-labels, PILOT achieves state-of-the-art anomaly detection and localization across 13 industrial and medical benchmarks without target-domain labels. The approach preserves CLIP's strengths while mitigating overfitting and distributional gaps, enabling robust performance under challenging cross-domain conditions. The results highlight the practical impact of zero-shot methods with targeted test-time adaptation for safety-critical inspection and diagnostics, reducing the need for labeled anomalies in deployment scenarios.

Abstract

Zero-shot anomaly detection (ZSAD) enables identifying and localizing defects in unseen categories by relying solely on generalizable features rather than requiring any labeled examples of anomalies. However, existing ZSAD methods, whether using fixed or learned prompts, struggle under domain shifts because their training data are derived from limited training domains and fail to generalize to new distributions. In this paper, we introduce PILOT, a framework designed to overcome these challenges through two key innovations: (1) a novel dual-branch prompt learning mechanism that dynamically integrates a pool of learnable prompts with structured semantic attributes, enabling the model to adaptively weight the most relevant anomaly cues for each input image; and (2) a label-free test-time adaptation strategy that updates the learnable prompt parameters using high-confidence pseudo-labels from unlabeled test data. Extensive experiments on 13 industrial and medical benchmarks demonstrate that PILOT achieves state-of-the-art performance in both anomaly detection and localization under domain shift.

Paper Structure

This paper contains 37 sections, 11 equations, 25 figures, 16 tables, 2 algorithms.

Figures (25)

  • Figure 1: Overview of PILOT's training phase. The left panel shows the main workflow. The right panel details the Learnable Prompt Pool $\mathcal{P}$ and Attribute Memory Bank $\mathcal{U}$.
  • Figure 2: Histogram of normalized projection residuals for normal and anomaly prompts (see Eq. \ref{['eq:fuse-main']}). The y-axis shows the fraction of samples per bin. Higher values indicate greater orthogonality to the anchor, while lower values reflect stronger alignment.
  • Figure 3: (a) Prompt-pair similarity distributions: each horizontal boxplot shows absolute cosine similarity between learned prompt embeddings. Values close to 0 indicate high diversity. (b) Prompt contribution: normalized contribution of a learnable prompt from the pool $\mathcal{P}$, aggregated over test images. Higher values indicate greater involvement during TTA.
  • Figure 4: Visualization of input images (top), ground‐truth masks (middle), and model‐generated anomaly maps (bottom) for the bottle category in the MVTec AD dataset. All results are generated by PILOT.
  • Figure 5: Visualization of input images (top), ground‐truth masks (middle), and model‐generated anomaly maps (bottom) for the grid category in the MVTec AD dataset. All results are generated by PILOT.
  • ...and 20 more figures