Template-Based Feature Aggregation Network for Industrial Anomaly Detection

Wei Luo; Haiming Yao; Wenyong Yu

Template-Based Feature Aggregation Network for Industrial Anomaly Detection

Wei Luo, Haiming Yao, Wenyong Yu

Abstract

Industrial anomaly detection plays a crucial role in ensuring product quality control. Therefore, proposing an effective anomaly detection model is of great significance. While existing feature-reconstruction methods have demonstrated excellent performance, they face challenges with shortcut learning, which can lead to undesirable reconstruction of anomalous features. To address this concern, we present a novel feature-reconstruction model called the \textbf{T}emplate-based \textbf{F}eature \textbf{A}ggregation \textbf{Net}work (TFA-Net) for anomaly detection via template-based feature aggregation. Specifically, TFA-Net first extracts multiple hierarchical features from a pre-trained convolutional neural network for a fixed template image and an input image. Instead of directly reconstructing input features, TFA-Net aggregates them onto the template features, effectively filtering out anomalous features that exhibit low similarity to normal template features. Next, TFA-Net utilizes the template features that have already fused normal features in the input features to refine feature details and obtain the reconstructed feature map. Finally, the defective regions can be located by comparing the differences between the input and reconstructed features. Additionally, a random masking strategy for input features is employed to enhance the overall inspection performance of the model. Our template-based feature aggregation schema yields a nontrivial and meaningful feature reconstruction task. The simple, yet efficient, TFA-Net exhibits state-of-the-art detection performance on various real-world industrial datasets. Additionally, it fulfills the real-time demands of industrial scenarios, rendering it highly suitable for practical applications in the industry. Code is available at https://github.com/luow23/TFA-Net.

Template-Based Feature Aggregation Network for Industrial Anomaly Detection

Abstract

Paper Structure (31 sections, 8 equations, 18 figures, 4 tables, 1 algorithm)

This paper contains 31 sections, 8 equations, 18 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Embedding-based Methods
Reconstruction-based Methods
The TFA-Net Methodology
Overall Architecture
Multiple Hierarchical Fusion Feature Extraction
Template-based Feature Aggregation Mechanism and Feature Detail Refinement Module
Discussion on TFAM and FDRM
The reason for discarding the input features after passing through TFAM
The specific effect of FDRM
Training and Testing Procedures
Training procedure
Testing procedure
Experiments
...and 16 more sections

Figures (18)

Figure 1: The comparison between direct feature reconstruction and template-based feature aggregation. Rec. and Ano. denote the reconstructed feature and anomaly map, respectively.
Figure 2: The comparison of feature aggregation methods based on CNN and ViT. For images from various orientations, CNN models may face challenges in feature aggregation due to their translational equivariance induction bias. In contrast, ViT exhibits better feature aggregation capabilities as it lacks the inductive bias of CNN and possesses global modeling capabilities.
Figure 3: The overall architecture of TFA-Net approach. The workflow of TFA-Net can be divided into four stages: multiple hierarchical fusion feature extraction, template-based feature aggregation mechanism (TFAM), feature detail refinement module (FDRM), and dual-mode anomaly segmentation. Firstly, TFA-Net employs a pre-trained CNN to extract multi-scale features. Subsequently, TFAM is utilized to filter out anomalous features while retaining normal features. Further reconstruction of the features is performed using FDRM, resulting in reconstructed features. Finally, the defect regions are localized by leveraging both the reconstructed features and the input features.
Figure 4: Illustration of multiple hierarchical fusion feature extraction, which encompasses the process of resizing feature maps from various scales to a uniform size and subsequently concatenating them in the channel dimension to obtain multi-level fused features.
Figure 5: Comparison between Vanilla ViT and Template-based feature aggregation mechanism (TFAM). (a) Vanilla ViT mechanism. In the ViT mechanism, defect features exhibit the highest similarity with themselves, causing defect features to self-aggregate and consequently resulting in a perfect reconstruction of defects. (b) TFAM. In the TFAM, defect features are dissimilar to template features, making it challenging for them to aggregate onto template features.
...and 13 more figures

Template-Based Feature Aggregation Network for Industrial Anomaly Detection

Abstract

Template-Based Feature Aggregation Network for Industrial Anomaly Detection

Authors

Abstract

Table of Contents

Figures (18)