Table of Contents
Fetching ...

On the Problem of Consistent Anomalies in Zero-Shot Anomaly Detection

Tai Le-Gia

TL;DR

This work tackles zero-shot anomaly detection by exposing and addressing the problematic recurrence of similar anomalies that bias distance-based, batch-oriented methods. It introduces CoDeGraph, a training-free graph framework that leverages neighbor-burnout and similarity-scaling to identify and prune consistent anomalies, yielding strong improvements on both anomaly classification and segmentation, especially in datasets with recurring defects. The work further provides a theoretical foundation for the observed similarity scaling via extreme value theory and local manifold geometry, and shows how to extend batch-based methods to 3D MRI volumes in a training-free fashion. Finally, it proposes a bridge between batch-based and text-based zero-shot methods through pseudo-masks, enabling downstream vision–language models to benefit from unsupervised structural cues, and demonstrates promising, though preliminary, cross-domain results. Altogether, these contributions offer a principled, scalable pathway for robust zero-shot AC/AS across 2D and 3D modalities with practical implications for industrial and medical imaging workflows.

Abstract

Zero-shot anomaly classification and segmentation (AC/AS) aim to detect anomalous samples and regions without any training data, a capability increasingly crucial in industrial inspection and medical imaging. This dissertation aims to investigate the core challenges of zero-shot AC/AS and presents principled solutions rooted in theory and algorithmic design. We first formalize the problem of consistent anomalies, a failure mode in which recurring similar anomalies systematically bias distance-based methods. By analyzing the statistical and geometric behavior of patch representations from pre-trained Vision Transformers, we identify two key phenomena - similarity scaling and neighbor-burnout - that describe how relationships among normal patches change with and without consistent anomalies in settings characterized by highly similar objects. We then introduce CoDeGraph, a graph-based framework for filtering consistent anomalies built on the similarity scaling and neighbor-burnout phenomena. Through multi-stage graph construction, community detection, and structured refinement, CoDeGraph effectively suppresses the influence of consistent anomalies. Next, we extend this framework to 3D medical imaging by proposing a training-free, computationally efficient volumetric tokenization strategy for MRI data. This enables a genuinely zero-shot 3D anomaly detection pipeline and shows that volumetric anomaly segmentation is achievable without any 3D training samples. Finally, we bridge batch-based and text-based zero-shot methods by demonstrating that CoDeGraph-derived pseudo-masks can supervise prompt-driven vision-language models. Together, this dissertation provides theoretical understanding and practical solutions for the zero-shot AC/AS problem.

On the Problem of Consistent Anomalies in Zero-Shot Anomaly Detection

TL;DR

This work tackles zero-shot anomaly detection by exposing and addressing the problematic recurrence of similar anomalies that bias distance-based, batch-oriented methods. It introduces CoDeGraph, a training-free graph framework that leverages neighbor-burnout and similarity-scaling to identify and prune consistent anomalies, yielding strong improvements on both anomaly classification and segmentation, especially in datasets with recurring defects. The work further provides a theoretical foundation for the observed similarity scaling via extreme value theory and local manifold geometry, and shows how to extend batch-based methods to 3D MRI volumes in a training-free fashion. Finally, it proposes a bridge between batch-based and text-based zero-shot methods through pseudo-masks, enabling downstream vision–language models to benefit from unsupervised structural cues, and demonstrates promising, though preliminary, cross-domain results. Altogether, these contributions offer a principled, scalable pathway for robust zero-shot AC/AS across 2D and 3D modalities with practical implications for industrial and medical imaging workflows.

Abstract

Zero-shot anomaly classification and segmentation (AC/AS) aim to detect anomalous samples and regions without any training data, a capability increasingly crucial in industrial inspection and medical imaging. This dissertation aims to investigate the core challenges of zero-shot AC/AS and presents principled solutions rooted in theory and algorithmic design. We first formalize the problem of consistent anomalies, a failure mode in which recurring similar anomalies systematically bias distance-based methods. By analyzing the statistical and geometric behavior of patch representations from pre-trained Vision Transformers, we identify two key phenomena - similarity scaling and neighbor-burnout - that describe how relationships among normal patches change with and without consistent anomalies in settings characterized by highly similar objects. We then introduce CoDeGraph, a graph-based framework for filtering consistent anomalies built on the similarity scaling and neighbor-burnout phenomena. Through multi-stage graph construction, community detection, and structured refinement, CoDeGraph effectively suppresses the influence of consistent anomalies. Next, we extend this framework to 3D medical imaging by proposing a training-free, computationally efficient volumetric tokenization strategy for MRI data. This enables a genuinely zero-shot 3D anomaly detection pipeline and shows that volumetric anomaly segmentation is achievable without any 3D training samples. Finally, we bridge batch-based and text-based zero-shot methods by demonstrating that CoDeGraph-derived pseudo-masks can supervise prompt-driven vision-language models. Together, this dissertation provides theoretical understanding and practical solutions for the zero-shot AC/AS problem.

Paper Structure

This paper contains 101 sections, 4 theorems, 113 equations, 23 figures, 11 tables, 2 algorithms.

Key Result

Theorem 4.1

The log-spacings $\ln Y_{(i+1)} - \ln Y_{(i)}$ are independent and follow $\mathrm{Exp}(\alpha_0 i)$ for $i = 1, \ldots, \omega-1$.

Figures (23)

  • Figure 1: Vision Transformer architecture.
  • Figure 2: Illustration of zero-shot anomaly detection's consistent-anomaly problem. Industrial images have normal patches (blue squares) that match nearly all test images. Scratches and other random anomalies have high anomaly scores since they fail to find similar matches across the test set. Defects from consistent-anomaly images (flipped metal nuts) easily find deceptive matches within the images (orange region) sharing the same anomaly pattern (rotate counter-clockwise instead of clockwise).
  • Figure 3: Representative examples of consistent anomalies. Top: multiple cable samples from MVTec AD mvtec exhibiting the "missing components". Bottom: multiple brain MRI from BraTS-2025 brats showing similar enhancing tumor regions with high signal intensity on post-contrast T2-weighted images. These illustrate that consistent anomalies appear frequently in both industrial and medical domains.
  • Figure 4: Similarity scaling in ViT-L/14@336.
  • Figure 5: Similarity scaling in DINOv2.
  • ...and 18 more figures

Theorems & Definitions (19)

  • Definition 2.1: $\epsilon$-neighbors and $\epsilon$-consistency
  • Definition 2.2: Semantic threshold $\epsilon_0$
  • Definition 2.3: $\epsilon$-consistent anomaly
  • Remark 2.1
  • Definition 2.4: Consistent anomaly problem
  • Remark 2.2
  • Remark 3.1
  • Remark 3.2
  • Definition 4.1: Slowly Varying Function
  • Definition 4.2
  • ...and 9 more