Table of Contents
Fetching ...

UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection

Zhaopeng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Ming Tang, Jinqiao Wang

TL;DR

UniVAD tackles the challenge of cross-domain visual anomaly detection by eliminating domain-specific training and adopting a unified, few-shot framework. It combines a Contextual Component Clustering ($C^3$) module for accurate component segmentation with Component-Aware Patch Matching (CAPM) and Graph-Enhanced Component Modeling (GECM) to detect anomalies at structural and logical semantic levels, respectively. The approach is validated across nine datasets spanning industrial, logical, and medical domains, achieving state-of-the-art results under few-normal-shot conditions and offering an adapter-based path for few-abnormal-shot domain adaptation. This work advances standardization and practical deployment of VAD by enabling robust anomaly detection with minimal labeled data and without extensive domain-specific training.

Abstract

Visual Anomaly Detection (VAD) aims to identify abnormal samples in images that deviate from normal patterns, covering multiple domains, including industrial, logical, and medical fields. Due to the domain gaps between these fields, existing VAD methods are typically tailored to each domain, with specialized detection techniques and model architectures that are difficult to generalize across different domains. Moreover, even within the same domain, current VAD approaches often follow a "one-category-one-model" paradigm, requiring large amounts of normal samples to train class-specific models, resulting in poor generalizability and hindering unified evaluation across domains. To address this issue, we propose a generalized few-shot VAD method, UniVAD, capable of detecting anomalies across various domains, such as industrial, logical, and medical anomalies, with a training-free unified model. UniVAD only needs few normal samples as references during testing to detect anomalies in previously unseen objects, without training on the specific domain. Specifically, UniVAD employs a Contextual Component Clustering ($C^3$) module based on clustering and vision foundation models to segment components within the image accurately, and leverages Component-Aware Patch Matching (CAPM) and Graph-Enhanced Component Modeling (GECM) modules to detect anomalies at different semantic levels, which are aggregated to produce the final detection result. We conduct experiments on nine datasets spanning industrial, logical, and medical fields, and the results demonstrate that UniVAD achieves state-of-the-art performance in few-shot anomaly detection tasks across multiple domains, outperforming domain-specific anomaly detection models. Code is available at https://github.com/FantasticGNU/UniVAD.

UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection

TL;DR

UniVAD tackles the challenge of cross-domain visual anomaly detection by eliminating domain-specific training and adopting a unified, few-shot framework. It combines a Contextual Component Clustering () module for accurate component segmentation with Component-Aware Patch Matching (CAPM) and Graph-Enhanced Component Modeling (GECM) to detect anomalies at structural and logical semantic levels, respectively. The approach is validated across nine datasets spanning industrial, logical, and medical domains, achieving state-of-the-art results under few-normal-shot conditions and offering an adapter-based path for few-abnormal-shot domain adaptation. This work advances standardization and practical deployment of VAD by enabling robust anomaly detection with minimal labeled data and without extensive domain-specific training.

Abstract

Visual Anomaly Detection (VAD) aims to identify abnormal samples in images that deviate from normal patterns, covering multiple domains, including industrial, logical, and medical fields. Due to the domain gaps between these fields, existing VAD methods are typically tailored to each domain, with specialized detection techniques and model architectures that are difficult to generalize across different domains. Moreover, even within the same domain, current VAD approaches often follow a "one-category-one-model" paradigm, requiring large amounts of normal samples to train class-specific models, resulting in poor generalizability and hindering unified evaluation across domains. To address this issue, we propose a generalized few-shot VAD method, UniVAD, capable of detecting anomalies across various domains, such as industrial, logical, and medical anomalies, with a training-free unified model. UniVAD only needs few normal samples as references during testing to detect anomalies in previously unseen objects, without training on the specific domain. Specifically, UniVAD employs a Contextual Component Clustering () module based on clustering and vision foundation models to segment components within the image accurately, and leverages Component-Aware Patch Matching (CAPM) and Graph-Enhanced Component Modeling (GECM) modules to detect anomalies at different semantic levels, which are aggregated to produce the final detection result. We conduct experiments on nine datasets spanning industrial, logical, and medical fields, and the results demonstrate that UniVAD achieves state-of-the-art performance in few-shot anomaly detection tasks across multiple domains, outperforming domain-specific anomaly detection models. Code is available at https://github.com/FantasticGNU/UniVAD.

Paper Structure

This paper contains 30 sections, 14 equations, 9 figures, 11 tables, 4 algorithms.

Figures (9)

  • Figure 1: 1-shot performance of existing VAD methods and UniVAD across different datasets in various domains. UniVAD achieves state-of-the-art results across multiple datasets and domains, outperforming specialized methods in each domain.
  • Figure 2: Comparison between UniVAD and existing VAD methods. Existing VAD methods are specifically designed for each domain, whereas UniVAD can perform anomaly detection tasks across multiple domains using a unified model.
  • Figure 3: The overall architecture of UniVAD. Given an input image, UniVAD first generates masks for each entity using the Contextual Component Clustering module (Sec 3.2). UniVAD then applies the Component-Aware Patch Matching module (Sec 3.3) and the Graph-Enhanced Component Modeling module (Sec 3.4) to detect structural and logical anomalies. The outputs from both expert modules are combined to produce the final unified anomaly detection result.
  • Figure 4: Architecture of the C$^3$ module.
  • Figure 5: Visualization result of UniVAD on datasets across diverse domains. UniVAD demonstrates a strong transferability by accurately segmenting previously unseen samples with only a single normal sample provided as reference.
  • ...and 4 more figures