Table of Contents
Fetching ...

APRIL-GAN: A Zero-/Few-Shot Anomaly Classification and Segmentation Method for CVPR 2023 VAND Workshop Challenge Tracks 1&2: 1st Place on Zero-shot AD and 4th Place on Few-shot AD

Xuhai Chen, Yue Han, Jiangning Zhang

TL;DR

This work tackles industrial anomaly detection under zero- and few-shot regimes by extending CLIP with per-stage linear adapters to map image features into the CLIP joint embedding space, enabling anomaly classification and segmentation guided by text prompts. In the few-shot setting, memory banks store reference features across encoder stages to refine anomaly maps without fine-tuning the adapters. The approach achieves first place in the zero-shot track and top-tier performance in the few-shot track of the VAND Challenge, with strong results on MVTec AD and VisA as well, demonstrating robust cross-dataset generalization and practical applicability for rapidly adapting to new product categories.

Abstract

In this technical report, we briefly introduce our solution for the Zero/Few-shot Track of the Visual Anomaly and Novelty Detection (VAND) 2023 Challenge. For industrial visual inspection, building a single model that can be rapidly adapted to numerous categories without or with only a few normal reference images is a promising research direction. This is primarily because of the vast variety of the product types. For the zero-shot track, we propose a solution based on the CLIP model by adding extra linear layers. These layers are used to map the image features to the joint embedding space, so that they can compare with the text features to generate the anomaly maps. Besides, when the reference images are available, we utilize multiple memory banks to store their features and compare them with the features of the test images during the testing phase. In this challenge, our method achieved first place in the zero-shot track, especially excelling in segmentation with an impressive F1 score improvement of 0.0489 over the second-ranked participant. Furthermore, in the few-shot track, we secured the fourth position overall, with our classification F1 score of 0.8687 ranking first among all participating teams.

APRIL-GAN: A Zero-/Few-Shot Anomaly Classification and Segmentation Method for CVPR 2023 VAND Workshop Challenge Tracks 1&2: 1st Place on Zero-shot AD and 4th Place on Few-shot AD

TL;DR

This work tackles industrial anomaly detection under zero- and few-shot regimes by extending CLIP with per-stage linear adapters to map image features into the CLIP joint embedding space, enabling anomaly classification and segmentation guided by text prompts. In the few-shot setting, memory banks store reference features across encoder stages to refine anomaly maps without fine-tuning the adapters. The approach achieves first place in the zero-shot track and top-tier performance in the few-shot track of the VAND Challenge, with strong results on MVTec AD and VisA as well, demonstrating robust cross-dataset generalization and practical applicability for rapidly adapting to new product categories.

Abstract

In this technical report, we briefly introduce our solution for the Zero/Few-shot Track of the Visual Anomaly and Novelty Detection (VAND) 2023 Challenge. For industrial visual inspection, building a single model that can be rapidly adapted to numerous categories without or with only a few normal reference images is a promising research direction. This is primarily because of the vast variety of the product types. For the zero-shot track, we propose a solution based on the CLIP model by adding extra linear layers. These layers are used to map the image features to the joint embedding space, so that they can compare with the text features to generate the anomaly maps. Besides, when the reference images are available, we utilize multiple memory banks to store their features and compare them with the features of the test images during the testing phase. In this challenge, our method achieved first place in the zero-shot track, especially excelling in segmentation with an impressive F1 score improvement of 0.0489 over the second-ranked participant. Furthermore, in the few-shot track, we secured the fourth position overall, with our classification F1 score of 0.8687 ranking first among all participating teams.
Paper Structure (11 sections, 4 equations, 4 figures, 12 tables)

This paper contains 11 sections, 4 equations, 4 figures, 12 tables.

Figures (4)

  • Figure 1: Overall diagram of our solution. 1) The blue dashed box represents the pipeline for the zero-shot settiing. The "Linear" components denote additional linear layers and "Text" indicates the corresponding text features. Note that "Text" in this Figure is used to represent the same value. 2) The orange dashed box represents the pipeline for the few-shot setting. The "memory" components represent the memory banks. The symbol with a letter C inside a circle denotes the calculation of cosine similarity.
  • Figure 2: Results visualizations on zero-/few-shot settings. The first row shows the original image, the second row displays the zero-shot results, and the third to fifth rows present the results for 1-shot, 5-shot, and 10-shot, respectively.
  • Figure 3: Visualization results on the MVTec ADmvtecad dataset under the zero-shot setting.
  • Figure 4: Visualization results on the VisAvisa dataset under the zero-shot setting.