Table of Contents
Fetching ...

Multi-Scale Memory Comparison for Zero-/Few-Shot Anomaly Detection

Chaoqin Huang, Aofan Jiang, Ya Zhang, Yanfeng Wang

TL;DR

The paper tackles zero-/few-shot anomaly detection in industrial vision, where only limited normal data and complex multi-object scenes pose challenges. It introduces a multi-scale memory comparison framework that combines a global memory bank for full-image features and an individual memory bank for single-object features, guided by SAM segmentation and CLIP-based text/image alignment, without any fine-tuning. In zero-shot settings, it leverages language prompts and multi-scale patch-based scoring to localize anomalies; in few-shot settings, it augments and compares memory banks across scales, integrating both memory similarity and text cues to produce robust anomaly maps. The approach yields competitive results on the VAND benchmark (4th zero-shot, 2nd few-shot), reduces reliance on heavy text prompts for segmentation, and highlights practical considerations for memory efficiency in real-world industrial deployment.

Abstract

Anomaly detection has gained considerable attention due to its broad range of applications, particularly in industrial defect detection. To address the challenges of data collection, researchers have introduced zero-/few-shot anomaly detection techniques that require minimal normal images for each category. However, complex industrial scenarios often involve multiple objects, presenting a significant challenge. In light of this, we propose a straightforward yet powerful multi-scale memory comparison framework for zero-/few-shot anomaly detection. Our approach employs a global memory bank to capture features across the entire image, while an individual memory bank focuses on simplified scenes containing a single object. The efficacy of our method is validated by its remarkable achievement of 4th place in the zero-shot track and 2nd place in the few-shot track of the Visual Anomaly and Novelty Detection (VAND) competition.

Multi-Scale Memory Comparison for Zero-/Few-Shot Anomaly Detection

TL;DR

The paper tackles zero-/few-shot anomaly detection in industrial vision, where only limited normal data and complex multi-object scenes pose challenges. It introduces a multi-scale memory comparison framework that combines a global memory bank for full-image features and an individual memory bank for single-object features, guided by SAM segmentation and CLIP-based text/image alignment, without any fine-tuning. In zero-shot settings, it leverages language prompts and multi-scale patch-based scoring to localize anomalies; in few-shot settings, it augments and compares memory banks across scales, integrating both memory similarity and text cues to produce robust anomaly maps. The approach yields competitive results on the VAND benchmark (4th zero-shot, 2nd few-shot), reduces reliance on heavy text prompts for segmentation, and highlights practical considerations for memory efficiency in real-world industrial deployment.

Abstract

Anomaly detection has gained considerable attention due to its broad range of applications, particularly in industrial defect detection. To address the challenges of data collection, researchers have introduced zero-/few-shot anomaly detection techniques that require minimal normal images for each category. However, complex industrial scenarios often involve multiple objects, presenting a significant challenge. In light of this, we propose a straightforward yet powerful multi-scale memory comparison framework for zero-/few-shot anomaly detection. Our approach employs a global memory bank to capture features across the entire image, while an individual memory bank focuses on simplified scenes containing a single object. The efficacy of our method is validated by its remarkable achievement of 4th place in the zero-shot track and 2nd place in the few-shot track of the Visual Anomaly and Novelty Detection (VAND) competition.
Paper Structure (11 sections, 9 equations, 3 figures, 1 table)

This paper contains 11 sections, 9 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: The overview of the proposed zero-shot anomaly detection and localization architecture.
  • Figure 2: The overview of the proposed few-shot anomaly detection and localization architecture.
  • Figure 3: Visualization results of individual object segmentation in the VAND challenge.