Feature Purified Transformer With Cross-level Feature Guiding Decoder For Multi-class OOD and Anomaly Deteciton

Jerry Chun-Wei Lin; Pi-Wei Chen; Chao-Chun Chen

Feature Purified Transformer With Cross-level Feature Guiding Decoder For Multi-class OOD and Anomaly Deteciton

Jerry Chun-Wei Lin, Pi-Wei Chen, Chao-Chun Chen

TL;DR

The FUTUREG framework, which incorporates two innovative modules: the Feature Purification Module (FPM) and the CFG Decoder, achieves state-of-the-art performance in multi-class OOD settings and remains competitive in industrial anomaly detection scenarios.

Abstract

Reconstruction networks are prevalently used in unsupervised anomaly and Out-of-Distribution (OOD) detection due to their independence from labeled anomaly data. However, in multi-class datasets, the effectiveness of anomaly detection is often compromised by the models' generalized reconstruction capabilities, which allow anomalies to blend within the expanded boundaries of normality resulting from the added categories, thereby reducing detection accuracy. We introduce the FUTUREG framework, which incorporates two innovative modules: the Feature Purification Module (FPM) and the CFG Decoder. The FPM constrains the normality boundary within the latent space to effectively filter out anomalous features, while the CFG Decoder uses layer-wise encoder representations to guide the reconstruction of filtered features, preserving fine-grained details. Together, these modules enhance the reconstruction error for anomalies, ensuring high-quality reconstructions for normal samples. Our results demonstrate that FUTUREG achieves state-of-the-art performance in multi-class OOD settings and remains competitive in industrial anomaly detection scenarios.

Feature Purified Transformer With Cross-level Feature Guiding Decoder For Multi-class OOD and Anomaly Deteciton

TL;DR

Abstract

Paper Structure (21 sections, 21 equations, 6 figures, 6 tables)

This paper contains 21 sections, 21 equations, 6 figures, 6 tables.

Introduction
Related Work
Prototype learning
Vision Transformer
Proposed Framework
Overview
The analysis of reason behind Identity shortcut
Normality Prototype Retrieval Module
Feature Purification Module
Cross-level Feature Guiding (CFG) Decoder
Objective Function
Experiments
Implementation details
OOD tasks comparison
Image-level and pixel-level industrial anomaly comparison
...and 6 more sections

Figures (6)

Figure 1: (a) illustrates the relationship between a single-class dataset and the latent space, specifically showing how the learned boundary excludes the anomalous embedding.; (b) illustrates the relationship between a multi-class dataset and the latent space, where the normality boundary tends to include anomalous embeddings; (c) illustrates our design philosophy that FUTUREG of constrain boundary for multi-class data into a class-conditioned boundary that applies exclusively to specific semantic classes in the latent space.
Figure 2: The overall architecture of FUTUREG
Figure 3: (a) illustrates the workflow of the proposed FPM; (b) illustrates the concept of how top-$k$ selection allows us to from a selective boundary that eliminates potential anomalous embedding.
Figure 4: The visualization of the comparison result between vanilla ViT and the proposed framework. Our proposed framework converts the anomalous pixels into normality-like pixels, causing a larger reconstruction error for the anomalous sample. The first and second rows are the image of an anomaly-free item and an anomalous item respectively. The third and fourth rows are the visualization of the reconstruction embedding of UniAD and our proposed MAD-ProFP.
Figure 5: Visualization of Latent Features Across All Classes in MVTec and MNIST Datasets.(a) Each cluster represents a distinct class from the MVTec dataset, showing significant diversity in feature representation, indicative of unique features for each class. (b) Clusters from the MNIST dataset exhibit a relatively compact distribution, indicating that each class has a less diverse feature representation. This visualization highlights the differences in feature variability between industrial and standard digit datasets.
...and 1 more figures

Feature Purified Transformer With Cross-level Feature Guiding Decoder For Multi-class OOD and Anomaly Deteciton

TL;DR

Abstract

Feature Purified Transformer With Cross-level Feature Guiding Decoder For Multi-class OOD and Anomaly Deteciton

Authors

TL;DR

Abstract

Table of Contents

Figures (6)