Table of Contents
Fetching ...

A Survey on RGB, 3D, and Multimodal Approaches for Unsupervised Industrial Image Anomaly Detection

Yuxuan Lin, Yang Chang, Xuan Tong, Jiawen Yu, Antonio Liotta, Guofan Huang, Wei Song, Deyu Zeng, Zongze Wu, Yan Wang, Wenqiang Zhang

TL;DR

This survey addresses the problem of unsupervised industrial image anomaly detection by comparing RGB, 3D, and multimodal approaches. It systematizes datasets, evaluation metrics, and method families, emphasizing three main RGB/UIAD paradigms—feature embedding, reconstruction, and large-model frameworks—and extending analysis to 3D and multimodal settings with diverse fusion strategies. Key contributions include a taxonomy of methods, a catalog of datasets and metrics, and a synthesis of current challenges plus future directions for robust, deployable UIAD systems. The work provides practical insights and a GitHub resource to guide researchers and practitioners toward scalable, cross-modal anomaly detection in industrial environments.

Abstract

In the advancement of industrial informatization, unsupervised anomaly detection technology effectively overcomes the scarcity of abnormal samples and significantly enhances the automation and reliability of smart manufacturing. As an important branch, industrial image anomaly detection focuses on automatically identifying visual anomalies in industrial scenarios (such as product surface defects, assembly errors, and equipment appearance anomalies) through computer vision techniques. With the rapid development of Unsupervised industrial Image Anomaly Detection (UIAD), excellent detection performance has been achieved not only in RGB setting but also in 3D and multimodal (RGB and 3D) settings. However, existing surveys primarily focus on UIAD tasks in RGB setting, with little discussion in 3D and multimodal settings. To address this gap, this artical provides a comprehensive review of UIAD tasks in the three modal settings. Specifically, we first introduce the task concept and process of UIAD. We then overview the research on UIAD in three modal settings (RGB, 3D, and multimodal), including datasets and methods, and review multimodal feature fusion strategies in multimodal setting. Finally, we summarize the main challenges faced by UIAD tasks in the three modal settings, and offer insights into future development directions, aiming to provide researchers with a comprehensive reference and offer new perspectives for the advancement of industrial informatization. Corresponding resources are available at https://github.com/Sunny5250/Awesome-Multi-Setting-UIAD.

A Survey on RGB, 3D, and Multimodal Approaches for Unsupervised Industrial Image Anomaly Detection

TL;DR

This survey addresses the problem of unsupervised industrial image anomaly detection by comparing RGB, 3D, and multimodal approaches. It systematizes datasets, evaluation metrics, and method families, emphasizing three main RGB/UIAD paradigms—feature embedding, reconstruction, and large-model frameworks—and extending analysis to 3D and multimodal settings with diverse fusion strategies. Key contributions include a taxonomy of methods, a catalog of datasets and metrics, and a synthesis of current challenges plus future directions for robust, deployable UIAD systems. The work provides practical insights and a GitHub resource to guide researchers and practitioners toward scalable, cross-modal anomaly detection in industrial environments.

Abstract

In the advancement of industrial informatization, unsupervised anomaly detection technology effectively overcomes the scarcity of abnormal samples and significantly enhances the automation and reliability of smart manufacturing. As an important branch, industrial image anomaly detection focuses on automatically identifying visual anomalies in industrial scenarios (such as product surface defects, assembly errors, and equipment appearance anomalies) through computer vision techniques. With the rapid development of Unsupervised industrial Image Anomaly Detection (UIAD), excellent detection performance has been achieved not only in RGB setting but also in 3D and multimodal (RGB and 3D) settings. However, existing surveys primarily focus on UIAD tasks in RGB setting, with little discussion in 3D and multimodal settings. To address this gap, this artical provides a comprehensive review of UIAD tasks in the three modal settings. Specifically, we first introduce the task concept and process of UIAD. We then overview the research on UIAD in three modal settings (RGB, 3D, and multimodal), including datasets and methods, and review multimodal feature fusion strategies in multimodal setting. Finally, we summarize the main challenges faced by UIAD tasks in the three modal settings, and offer insights into future development directions, aiming to provide researchers with a comprehensive reference and offer new perspectives for the advancement of industrial informatization. Corresponding resources are available at https://github.com/Sunny5250/Awesome-Multi-Setting-UIAD.

Paper Structure

This paper contains 59 sections, 3 equations, 18 figures, 9 tables.

Figures (18)

  • Figure 1: Examples of some classes in the MVTec 3D-AD dataset and the unsupervised industrial anomaly detection (UIAD) pipeline. (a) We take RGB, 3D and multimodal samples as examples, and find that different modals of the same object can compensate for the information limitations of a single modal, enabling the detection of more types of anomalies. (b) The training and inference phases of UIAD.
  • Figure 2: The number of citations for some respective RGB and multimodal datasets over time (green for RGB, blue for multimodal). Among RGB datasets, MVTec AD has the highest citation ratio, which has a profound impact on the field of RGB anomaly detection. It is also evident that the number of citations for multimodal datasets has increased year by year since they were proposed, indicating that multimodal information is receiving more and more attention. Citation counts were obtained in Google Scholar.
  • Figure 3: Roadmap of RGB, 3D, multimodal Unsupervised Industrial Anomaly Detection (UIAD). We conduct an in-depth analysis of RGB, 3D, and multimodal UIAD, summarizing existing UIAD methods from the perspectives of input types, architectures and learning methods. We identify the commonalities and distinctions of UIAD methods in different settings. Based on the above analysis, we discuss the challenges faced by UIAD methods in different settings and corresponding potential future research directions.
  • Figure 4: Process of teacher-student architecture paradigm.
  • Figure 5: Process of one-class classification paradigm.
  • ...and 13 more figures