Table of Contents
Fetching ...

Explainable Deep Convolutional Multi-Type Anomaly Detection

Alex George, Lyudmila Mihaylova, Sean Anderson

TL;DR

<3-5 sentence high-level summary> The paper tackles the need for explainable anomaly detection that can identify and distinguish multiple anomaly types across diverse objects without training separate models or relying on heavy vision-language models. It introduces MultiTypeFCDD, a lightweight extension of FCDD that outputs multi-channel, per-type anomaly heatmaps using only image-level labels, along with a specialized loss and a balanced training strategy. Evaluated on Real-IAD, the method achieves competitive image-level AUROC and strong localisation (P-AUROC/AUPRO) while using a fraction of the parameters and inference time of VLM baselines. The approach enables practical deployment in real-time or embedded systems, offering scalable, interpretable, multi-type anomaly detection across object categories.

Abstract

Explainable anomaly detection methods often have the capability to identify and spatially localise anomalies within an image but lack the capability to differentiate the type of anomaly. Furthermore, they often require the costly training and maintenance of separate models for each object category. The lack of specificity is a significant research gap because identifying the type of anomaly (e.g., "Crack" vs. "Scratch") is crucial for accurate diagnosis that facilitates cost-saving operational decisions across diverse application domains. While some recent large-scale Vision-Language Models (VLMs) have begun to address this, they are computationally intensive and memory-heavy, restricting their use in real-time or embedded systems. We propose MultiTypeFCDD, a simple and lightweight convolutional framework designed as a practical alternative for explainable multi-type anomaly detection. MultiTypeFCDD uses only image-level labels to learn and produce multi-channel heatmaps, where each channel is trained to correspond to a specific anomaly type. The model functions as a single, unified framework capable of differentiating anomaly types across multiple object categories, eliminating the need to train and manage separate models for each object category. We evaluated our proposed method on the Real-IAD dataset and it delivers competitive results (96.4% I-AUROC) at just over 1% the size of state-of-the-art VLM models used for similar tasks. This makes it a highly practical and viable solution for real-world applications where computational resources are tightly constrained.

Explainable Deep Convolutional Multi-Type Anomaly Detection

TL;DR

<3-5 sentence high-level summary> The paper tackles the need for explainable anomaly detection that can identify and distinguish multiple anomaly types across diverse objects without training separate models or relying on heavy vision-language models. It introduces MultiTypeFCDD, a lightweight extension of FCDD that outputs multi-channel, per-type anomaly heatmaps using only image-level labels, along with a specialized loss and a balanced training strategy. Evaluated on Real-IAD, the method achieves competitive image-level AUROC and strong localisation (P-AUROC/AUPRO) while using a fraction of the parameters and inference time of VLM baselines. The approach enables practical deployment in real-time or embedded systems, offering scalable, interpretable, multi-type anomaly detection across object categories.

Abstract

Explainable anomaly detection methods often have the capability to identify and spatially localise anomalies within an image but lack the capability to differentiate the type of anomaly. Furthermore, they often require the costly training and maintenance of separate models for each object category. The lack of specificity is a significant research gap because identifying the type of anomaly (e.g., "Crack" vs. "Scratch") is crucial for accurate diagnosis that facilitates cost-saving operational decisions across diverse application domains. While some recent large-scale Vision-Language Models (VLMs) have begun to address this, they are computationally intensive and memory-heavy, restricting their use in real-time or embedded systems. We propose MultiTypeFCDD, a simple and lightweight convolutional framework designed as a practical alternative for explainable multi-type anomaly detection. MultiTypeFCDD uses only image-level labels to learn and produce multi-channel heatmaps, where each channel is trained to correspond to a specific anomaly type. The model functions as a single, unified framework capable of differentiating anomaly types across multiple object categories, eliminating the need to train and manage separate models for each object category. We evaluated our proposed method on the Real-IAD dataset and it delivers competitive results (96.4% I-AUROC) at just over 1% the size of state-of-the-art VLM models used for similar tasks. This makes it a highly practical and viable solution for real-world applications where computational resources are tightly constrained.

Paper Structure

This paper contains 29 sections, 6 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Illustration of multi-type anomaly detection on two objects (USB and Woodstick) from the Real-IAD dataset wang2024real. (a) In multi-class anomaly detection, most explainable methods require training on separate networks to detect anomalies in each object class. (b) Recent research has introduced unified frameworks to detect anomalies across multiple objects. However, both (a) and (b) are only capable of detecting an anomaly and not what type of anomaly. (c) Our proposed framework enables the detection of multiple types of anomalies occurring across diverse objects using a single, unified model.
  • Figure 2: Illustration of MultiTypeFCDD framework. An input image $X$ is fed into a convolutional network $\phi$ which produces multi-channel outputs $A_k$ corresponding to different anomaly types. These outputs are then resized to heatmaps $A'_k$ that match the input image dimensions.
  • Figure 3: Examples of multi-type anomaly detection across multiple objects ($\alpha = 0.2$). Each pair of images in the grid shows the input image with the ground-truth anomaly highlighted by the circle (left) and the corresponding predicted anomaly heatmap by MultiTypeFCDD overlaid on the input image (right), with colours defining each anomaly type indicated in the legend on the right.
  • Figure 4: Example of multi-type anomaly detection using manually edited synthetic test images containing multiple co-occurring anomalies ($\alpha = 0.2$). For each image, the corresponding overlaid anomaly heatmap highlights all detected anomalous regions.
  • Figure 5: Comparison of anomaly heatmaps for a test object (audiojack) across different $\alpha$ levels: 0.1, 0.2, 0.4.