Hard-normal Example-aware Template Mutual Matching for Industrial Anomaly Detection

Zixuan Chen; Xiaohua Xie; Lingxiao Yang; Jianhuang Lai

Hard-normal Example-aware Template Mutual Matching for Industrial Anomaly Detection

Zixuan Chen, Xiaohua Xie, Lingxiao Yang, Jianhuang Lai

TL;DR

This work tackles industrial anomaly detection by addressing the misclassification of hard-normal examples as anomalies. It introduces Hard-normal Example-aware Template Mutual Matching (HETMM), combining Affine-invariant Template Mutual Matching (ATMM) with Pixel-level Template Selection (PTS) to form a robust, training-free, prototype-based boundary. ATMM conducts forward and backward mutual matching to achieve affine-invariant robustness and better discrimination between hard-normals and anomalies, while PTS compresses the template set by retaining both easy-normal centers and hard-normal prototypes. Across six real-world datasets, HETMM delivers state-of-the-art detection and localization performance, with a 60-sheet template achieving real-time inference (~26.1 FPS) and the ability to hot-update by inserting new samples, making it highly suitable for production and incremental learning scenarios.

Abstract

Anomaly detectors are widely used in industrial manufacturing to detect and localize unknown defects in query images. These detectors are trained on anomaly-free samples and have successfully distinguished anomalies from most normal samples. However, hard-normal examples are scattered and far apart from most normal samples, and thus they are often mistaken for anomalies by existing methods. To address this issue, we propose Hard-normal Example-aware Template Mutual Matching (HETMM), an efficient framework to build a robust prototype-based decision boundary. Specifically, HETMM employs the proposed Affine-invariant Template Mutual Matching (ATMM) to mitigate the affection brought by the affine transformations and easy-normal examples. By mutually matching the pixel-level prototypes within the patch-level search spaces between query and template set, ATMM can accurately distinguish between hard-normal examples and anomalies, achieving low false-positive and missed-detection rates. In addition, we also propose PTS to compress the original template set for speed-up. PTS selects cluster centres and hard-normal examples to preserve the original decision boundary, allowing this tiny set to achieve comparable performance to the original one. Extensive experiments demonstrate that HETMM outperforms state-of-the-art methods, while using a 60-sheet tiny set can achieve competitive performance and real-time inference speed (around 26.1 FPS) on a Quadro 8000 RTX GPU. HETMM is training-free and can be hot-updated by directly inserting novel samples into the template set, which can promptly address some incremental learning issues in industrial manufacturing.

Hard-normal Example-aware Template Mutual Matching for Industrial Anomaly Detection

TL;DR

Abstract

Paper Structure (17 sections, 12 equations, 12 figures, 14 tables)

This paper contains 17 sections, 12 equations, 12 figures, 14 tables.

Introduction
Related Works
Anomaly Detection
Multi-Prototype Representation
Methodology
Preliminary
Affine-invariant Template Mutual Matching
Pixel-level Template Selection
Overall Framework
Detection and Localization
Experiments & Analysis
Experimental Details
Evaluations on the MVTec AD
Evaluation on other datasets
Ablation Study and Sensitivity Analysis
...and 2 more sections

Figures (12)

Figure 1: Visualization of training data (ball) and queries (cube) via t-SNE tsne. Visually, existing methods' decision boundaries are dominated by the overwhelming number of easy-normal examples (blue balls). Hence, the normal queries (green cubes) near the hard-normal examples (orange balls) are prone to be erroneously identified as anomalies (purple cubes), resulting in a high false-positive or missed-detection rate. To address this issue, we propose HETMM to construct a robust prototype-based decision boundary, which can accurately distinguish hard-normal examples from anomalies.
Figure 2: Visual examples of different template matching approaches. As shown, for each query space (orange frames), pixel-level template matching (a) and patch-level template matching (b) first search for the similar prototypes within the corresponding pixel- and patch-level search spaces (purple frames) in the template set. Then, the corresponding anomaly score is obtained by the distance with its matched prototypes (red frames). However, pixel- and patch-level template matching strategies are vulnerable to confusing hard-normal examples (blue cubes) with anomalies (pink cubes). Specifically, pixel-level template matching often misses the best-matched prototypes of hard-normal examples due to the slight affine transformations, misclassifying some hard-normal examples as anomalies. While patch-level template matching achieves better robustness against affine transformations, the signals of subtle anomalies may be covered by the overwhelming number of easy-normal examples (grey cubes) within the patch, misclassifying some anomalies as normal samples. By contrast, the forward ATM(c) explores the pixel-level prototypes within the corresponding patch-level search space, which can accurately distinguish between hard-normal examples and anomalies.
Figure 3: For each query object, forward ATM(a) only explores its similar prototypes (red frames) within the corresponding search space. Therefore, forward ATM may misclassify anomalies lacking key objects (purple cubes) in the query patches $\mathcal{P}^{\mathcal{Q}^{(j)}}_{x,y}$ as normal samples. On the contrary, for the elements in $\mathcal{T}^{(j)}_{x,y}$, backward ATM(b) explores similar prototypes within that neighborhood, which can identify whether any key objects are absent, localizing anomalies that are overlooked by forward ATM.
Figure 4: The inner structure of ATMM. For each query image, ATMM first generates bi-directional anomaly maps $\overrightarrow{S}^{(j)}$ and $\overleftarrow{S}^{(j)}$ using the forward and backward ATM modules, where the anomalous regions in these two maps are complementary to each other. The output anomaly map $S^{(j)}$ is the weighted sum of these two maps in \ref{['eq:merge']}.
Figure 5: The visualization of multi-prototype representation results over the original template set (grey balls) via t-SNE tsne. Visually, the prototypes selected by random policy (a) are rambling, while $K$-Means (b) only collect easy-normal prototypes. By contrast, PTS(c) achieves a better distribution coverage of the original template set, collecting OPTICS OPTICS centres (green balls) and hard-normal prototypes (red balls) to persist the original decision boundaries.
...and 7 more figures

Hard-normal Example-aware Template Mutual Matching for Industrial Anomaly Detection

TL;DR

Abstract

Hard-normal Example-aware Template Mutual Matching for Industrial Anomaly Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (12)