Table of Contents
Fetching ...

Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark

Jiangning Zhang, Chengjie Wang, Xiangtai Li, Guanzhong Tian, Zhucun Xue, Yong Liu, Guansong Pang, Dacheng Tao

TL;DR

This work addresses the limitations of current anomaly detection benchmarks by introducing COCO-AD, a large-scale general-purpose MUAD dataset derived from COCO, and proposing four practical, threshold-dependent metrics tailored for AD. It presents InvAD, a GAN-inversion–inspired feature inversion framework with a Space-aware Style Modulation module that enables high-quality, input-dependent feature reconstruction for accurate multi-class anomaly localization. The paper demonstrates InvAD’s strong, consistent performance across COCO-AD, MVTec AD, and VisA, with extensive ablations validating each component and an InvAD-lite variant optimizing efficiency. Collectively, COCO-AD, the new metrics, and InvAD advance MUAD toward scalable, real-world applicability and provide a robust benchmark for continued progress in visual anomaly detection.

Abstract

Anomaly detection (AD) is often focused on detecting anomaly areas for industrial quality inspection and medical lesion examination. However, due to the specific scenario targets, the data scale for AD is relatively small, and evaluation metrics are still deficient compared to classic vision tasks, such as object detection and semantic segmentation. To fill these gaps, this work first constructs a large-scale and general-purpose COCO-AD dataset by extending COCO to the AD field. This enables fair evaluation and sustainable development for different methods on this challenging benchmark. Moreover, current metrics such as AU-ROC have nearly reached saturation on simple datasets, which prevents a comprehensive evaluation of different methods. Inspired by the metrics in the segmentation field, we further propose several more practical threshold-dependent AD-specific metrics, ie, m$F_1$$^{.2}_{.8}$, mAcc$^{.2}_{.8}$, mIoU$^{.2}_{.8}$, and mIoU-max. Motivated by GAN inversion's high-quality reconstruction capability, we propose a simple but more powerful InvAD framework to achieve high-quality feature reconstruction. Our method improves the effectiveness of reconstruction-based methods on popular MVTec AD, VisA, and our newly proposed COCO-AD datasets under a multi-class unsupervised setting, where only a single detection model is trained to detect anomalies from different classes. Extensive ablation experiments have demonstrated the effectiveness of each component of our InvAD. Full codes and models are available at https://github.com/zhangzjn/ader.

Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark

TL;DR

This work addresses the limitations of current anomaly detection benchmarks by introducing COCO-AD, a large-scale general-purpose MUAD dataset derived from COCO, and proposing four practical, threshold-dependent metrics tailored for AD. It presents InvAD, a GAN-inversion–inspired feature inversion framework with a Space-aware Style Modulation module that enables high-quality, input-dependent feature reconstruction for accurate multi-class anomaly localization. The paper demonstrates InvAD’s strong, consistent performance across COCO-AD, MVTec AD, and VisA, with extensive ablations validating each component and an InvAD-lite variant optimizing efficiency. Collectively, COCO-AD, the new metrics, and InvAD advance MUAD toward scalable, real-world applicability and provide a robust benchmark for continued progress in visual anomaly detection.

Abstract

Anomaly detection (AD) is often focused on detecting anomaly areas for industrial quality inspection and medical lesion examination. However, due to the specific scenario targets, the data scale for AD is relatively small, and evaluation metrics are still deficient compared to classic vision tasks, such as object detection and semantic segmentation. To fill these gaps, this work first constructs a large-scale and general-purpose COCO-AD dataset by extending COCO to the AD field. This enables fair evaluation and sustainable development for different methods on this challenging benchmark. Moreover, current metrics such as AU-ROC have nearly reached saturation on simple datasets, which prevents a comprehensive evaluation of different methods. Inspired by the metrics in the segmentation field, we further propose several more practical threshold-dependent AD-specific metrics, ie, m, mAcc, mIoU, and mIoU-max. Motivated by GAN inversion's high-quality reconstruction capability, we propose a simple but more powerful InvAD framework to achieve high-quality feature reconstruction. Our method improves the effectiveness of reconstruction-based methods on popular MVTec AD, VisA, and our newly proposed COCO-AD datasets under a multi-class unsupervised setting, where only a single detection model is trained to detect anomalies from different classes. Extensive ablation experiments have demonstrated the effectiveness of each component of our InvAD. Full codes and models are available at https://github.com/zhangzjn/ader.
Paper Structure (21 sections, 3 equations, 8 figures, 22 tables)

This paper contains 21 sections, 3 equations, 8 figures, 22 tables.

Figures (8)

  • Figure 1: Left: Comparison among representative anomaly detection datasets and our proposed general-purpose COCO-AD. Middle: Example visualization of RD on MVTec AD dataset mvtec. The excessively high current metrics (especially mAU-ROC) does not align with the visualization results, while our segmentation-inspired metrics provide more objective outcomes. Right: Schematic comparison between RD rd and our feature inversion framework.
  • Figure 2: Left: Schematic AD task. Right: Performance comparison among our InvAD and SoTA methods.
  • Figure 3: Anomaly score distribution on object Hazelnut and texture Toothbrush. Our method exhibits less overlap between normal and abnormal values and has a clearer demarcation line. Furthermore, the values tend to be closer to 0 (normal) and 1 (anomaly), respectively.
  • Figure 4: Left: Feature inversion concept inspired by GAN inversion. Right: Overview of the proposed InvAD framework that consists of four components in tandem: 1) Multi-scale features $\bm{F}^{I}$ extracted by image encoder $\textcolor{rgb(176,36,24)}{\bm{\phi}^{I}}$ is aggregated into low-resolution $\bm{F}^{f}$ to avoid spatial consistency mapping; 2) Re-scaling upsampler $\textcolor{rgb(176,36,24)}{\bm{\phi}^{R}}$ obtains re-scaled features $\bm{F}^{R}$ by several Up-sampling Blocks; 3) An style translator $\textcolor{rgb(176,36,24)}{\bm{\phi}^{S}}$, composed of Styling Block, to obtain style features $\bm{F}^{S}$ as the modulation control signal for the next stage; 4) An feature decoder $\textcolor{rgb(176,36,24)}{\bm{\phi}^{O}}$ to recover input features $\bm{F}^{O}$ for loss and anomaly map calculation by cascaded Space-aware Style Modulation modules .
  • Figure 5: Visual comparison of anomaly maps for different types of objects on COCO-AD (Left), MVTec AD (Middle), and VisA (Right) datasets. Our InvAD locates more compact segmentation results with ground truths while having a smaller response in normal areas.
  • ...and 3 more figures