Table of Contents
Fetching ...

Component-aware Unsupervised Logical Anomaly Generation for Industrial Anomaly Detection

Xuan Tong, Yang Chang, Qing Zhao, Jiawen Yu, Boyang Wang, Junxiong Lin, Yuxuan Lin, Xinji Mai, Haoran Wang, Zeng Tao, Yan Wang, Wenqiang Zhang

TL;DR

This work tackles industrial anomaly detection under the constraint of scarce anomalous data by introducing ComGEN, a component-aware, unsupervised framework that treats logical anomalies as a compositional problem. It combines Multi-Component Learning to disentangle object components, Prompt Modifications and Low-Density Sampling to generate diverse anomalies, and a memory-based Reference Neighbor Association with Text-to-Component Residual Mapping to produce precise anomaly masks. The detection backbone CSDA fuses multi-scale features with residual guidance to accelerate learning, while a tailored loss balances localization and segmentation quality. Empirical results on MVTecLOCO show state-of-the-art generation and detection performance (AUROC up to 91.2%), with successful transfer to real-world Diesel Engine data and MVTecAD, demonstrating practical impact for manufacturing pipelines.

Abstract

Anomaly detection is critical in industrial manufacturing for ensuring product quality and improving efficiency in automated processes. The scarcity of anomalous samples limits traditional detection methods, making anomaly generation essential for expanding the data repository. However, recent generative models often produce unrealistic anomalies increasing false positives, or require real-world anomaly samples for training. In this work, we treat anomaly generation as a compositional problem and propose ComGEN, a component-aware and unsupervised framework that addresses the gap in logical anomaly generation. Our method comprises a multi-component learning strategy to disentangle visual components, followed by subsequent generation editing procedures. Disentangled text-to-component pairs, revealing intrinsic logical constraints, conduct attention-guided residual mapping and model training with iteratively matched references across multiple scales. Experiments on the MVTecLOCO dataset confirm the efficacy of ComGEN, achieving the best AUROC score of 91.2%. Additional experiments on the real-world scenario of Diesel Engine and widely-used MVTecAD dataset demonstrate significant performance improvements when integrating simulated anomalies generated by ComGEN into automated production workflows.

Component-aware Unsupervised Logical Anomaly Generation for Industrial Anomaly Detection

TL;DR

This work tackles industrial anomaly detection under the constraint of scarce anomalous data by introducing ComGEN, a component-aware, unsupervised framework that treats logical anomalies as a compositional problem. It combines Multi-Component Learning to disentangle object components, Prompt Modifications and Low-Density Sampling to generate diverse anomalies, and a memory-based Reference Neighbor Association with Text-to-Component Residual Mapping to produce precise anomaly masks. The detection backbone CSDA fuses multi-scale features with residual guidance to accelerate learning, while a tailored loss balances localization and segmentation quality. Empirical results on MVTecLOCO show state-of-the-art generation and detection performance (AUROC up to 91.2%), with successful transfer to real-world Diesel Engine data and MVTecAD, demonstrating practical impact for manufacturing pipelines.

Abstract

Anomaly detection is critical in industrial manufacturing for ensuring product quality and improving efficiency in automated processes. The scarcity of anomalous samples limits traditional detection methods, making anomaly generation essential for expanding the data repository. However, recent generative models often produce unrealistic anomalies increasing false positives, or require real-world anomaly samples for training. In this work, we treat anomaly generation as a compositional problem and propose ComGEN, a component-aware and unsupervised framework that addresses the gap in logical anomaly generation. Our method comprises a multi-component learning strategy to disentangle visual components, followed by subsequent generation editing procedures. Disentangled text-to-component pairs, revealing intrinsic logical constraints, conduct attention-guided residual mapping and model training with iteratively matched references across multiple scales. Experiments on the MVTecLOCO dataset confirm the efficacy of ComGEN, achieving the best AUROC score of 91.2%. Additional experiments on the real-world scenario of Diesel Engine and widely-used MVTecAD dataset demonstrate significant performance improvements when integrating simulated anomalies generated by ComGEN into automated production workflows.

Paper Structure

This paper contains 13 sections, 12 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Comparison between existing anomaly generation methods and ours. (1) Anomaly generation results on MVTecLOCO dataset bergmann2022beyond; (2) Anomaly generation results on MVTecAD dataset bergmann2019mvtec and real-world Diesel Engine; (3) Anomaly localization results on MVTecLOCO (ground-truth masks in lower-right corner); (4) Comparison of network architectures.
  • Figure 2: The pipeline of ComGEN consisting of three stages: Anomaly Generation, Mask Generation and Anomaly Detection. I. Multi-Component Learning (MCL) disentangles image regions to align text tokens and components. Then Prompt Modifications (PM) and Low-density Sampling (LS) enhance generation. II. Reference Neighbor Association (RNA) searches the closest normal samples to anomalies, which are input to Residual Mapping (RM) together to generated masks. III. Their differential features are fed to Cross-Scale Difference Aggregation Module (CSDA) for model acceleration.
  • Figure 3: Localization and generation results of ComGEN from MVTecLOCO dataset. Logical and Structural Anomalies: Comparison of detection results between others and ours. Generated Anomalies: Images with masks (bottom-right). Seg shows segmented components based on cross-attention maps.
  • Figure 4: Generation and localization results. Data Collection: Hardware-based Data Acquisition Environment. Casting Surfaces: Multi-surface results on Diesel Engine, where generated anomalies are marked with red boxes and heap maps highlight anomalous regions (ground-truth masks in low-right corner). MVTec AD: Generation results of object categories (from Bottle to Screw) and texture categories (from Carpet to Leather), where we compared texture anomalies with Anodiff (left) and Ours (right).