Table of Contents
Fetching ...

A Survey on Visual Anomaly Detection: Challenge, Approach, and Prospect

Yunkang Cao, Xiaohao Xu, Jiangning Zhang, Yuqi Cheng, Xiaonan Huang, Guansong Pang, Weiming Shen

TL;DR

This survey frames Visual Anomaly Detection (VAD) around three core challenges: data scarcity, diverse visual modalities, and the hierarchical nature of anomalies. It systematically reviews progress through three lenses—sample regime (semi-supervised, unsupervised, few-shot, zero-shot), data modality (2D RGB, 3D, multimodal), and anomaly hierarchy (structural vs semantic)—and discusses representative methods, datasets, and evaluation metrics used in industrial contexts such as MVTec AD and MVTec LOCO. The paper highlights key trends, including the rise of memory-bank and diffusion-based anomaly generation in unsupervised and zero-shot methods, and the growing importance of multimodal fusion and relational reasoning for semantic anomalies. Looking forward, it identifies generic VAD via foundation models, scalable data generation, multimodal learning, and holistic integration with downstream tasks as the main directions to enable robust, real-world deployment.

Abstract

Visual Anomaly Detection (VAD) endeavors to pinpoint deviations from the concept of normality in visual data, widely applied across diverse domains, e.g., industrial defect inspection, and medical lesion detection. This survey comprehensively examines recent advancements in VAD by identifying three primary challenges: 1) scarcity of training data, 2) diversity of visual modalities, and 3) complexity of hierarchical anomalies. Starting with a brief overview of the VAD background and its generic concept definitions, we progressively categorize, emphasize, and discuss the latest VAD progress from the perspective of sample number, data modality, and anomaly hierarchy. Through an in-depth analysis of the VAD field, we finally summarize future developments for VAD and conclude the key findings and contributions of this survey.

A Survey on Visual Anomaly Detection: Challenge, Approach, and Prospect

TL;DR

This survey frames Visual Anomaly Detection (VAD) around three core challenges: data scarcity, diverse visual modalities, and the hierarchical nature of anomalies. It systematically reviews progress through three lenses—sample regime (semi-supervised, unsupervised, few-shot, zero-shot), data modality (2D RGB, 3D, multimodal), and anomaly hierarchy (structural vs semantic)—and discusses representative methods, datasets, and evaluation metrics used in industrial contexts such as MVTec AD and MVTec LOCO. The paper highlights key trends, including the rise of memory-bank and diffusion-based anomaly generation in unsupervised and zero-shot methods, and the growing importance of multimodal fusion and relational reasoning for semantic anomalies. Looking forward, it identifies generic VAD via foundation models, scalable data generation, multimodal learning, and holistic integration with downstream tasks as the main directions to enable robust, real-world deployment.

Abstract

Visual Anomaly Detection (VAD) endeavors to pinpoint deviations from the concept of normality in visual data, widely applied across diverse domains, e.g., industrial defect inspection, and medical lesion detection. This survey comprehensively examines recent advancements in VAD by identifying three primary challenges: 1) scarcity of training data, 2) diversity of visual modalities, and 3) complexity of hierarchical anomalies. Starting with a brief overview of the VAD background and its generic concept definitions, we progressively categorize, emphasize, and discuss the latest VAD progress from the perspective of sample number, data modality, and anomaly hierarchy. Through an in-depth analysis of the VAD field, we finally summarize future developments for VAD and conclude the key findings and contributions of this survey.
Paper Structure (34 sections, 2 figures, 5 tables)

This paper contains 34 sections, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Major VAD challenges (Top) and taxonomies (Bottom).
  • Figure 2: Number of VAD publications regarding taxonomies under the three perspectives in Sec. \ref{['sec:3']}. Blue for Sec. \ref{['sec:3.1']}, green for Sec. \ref{['sec:3.2']}, and orange for Sec. \ref{['sec:3.3']}.