Table of Contents
Fetching ...

Patch-aware Vector Quantized Codebook Learning for Unsupervised Visual Defect Detection

Qisen Cheng, Shuhui Qu, Janghwan Lee

TL;DR

The paper tackles unsupervised visual defect detection by learning a normality memory that encodes typical patterns into a discrete codebook with $K$ codes. PVQAE extends VQ-VAE with patch-aware dynamic code allocation across a resolution set $R$, governed by a Dynamic Routing Module and a multi-loss objective including a progressive budget term. Normal budget priors are learned via a Budget Prior Transformer to predict typical budget patterns, which constrain reconstruction in potential defect regions. On MVTecAD, BTAD, and MTSD, PVQAE achieves state-of-the-art or competitive performance for image- and pixel-level defect detection, while avoiding excessive memory or computation compared with some baseline methods.

Abstract

Unsupervised visual defect detection is critical in industrial applications, requiring a representation space that captures normal data features while detecting deviations. Achieving a balance between expressiveness and compactness is challenging; an overly expressive space risks inefficiency and mode collapse, impairing detection accuracy. We propose a novel approach using an enhanced VQ-VAE framework optimized for unsupervised defect detection. Our model introduces a patch-aware dynamic code assignment scheme, enabling context-sensitive code allocation to optimize spatial representation. This strategy enhances normal-defect distinction and improves detection accuracy during inference. Experiments on MVTecAD, BTAD, and MTSD datasets show our method achieves state-of-the-art performance.

Patch-aware Vector Quantized Codebook Learning for Unsupervised Visual Defect Detection

TL;DR

The paper tackles unsupervised visual defect detection by learning a normality memory that encodes typical patterns into a discrete codebook with codes. PVQAE extends VQ-VAE with patch-aware dynamic code allocation across a resolution set , governed by a Dynamic Routing Module and a multi-loss objective including a progressive budget term. Normal budget priors are learned via a Budget Prior Transformer to predict typical budget patterns, which constrain reconstruction in potential defect regions. On MVTecAD, BTAD, and MTSD, PVQAE achieves state-of-the-art or competitive performance for image- and pixel-level defect detection, while avoiding excessive memory or computation compared with some baseline methods.

Abstract

Unsupervised visual defect detection is critical in industrial applications, requiring a representation space that captures normal data features while detecting deviations. Achieving a balance between expressiveness and compactness is challenging; an overly expressive space risks inefficiency and mode collapse, impairing detection accuracy. We propose a novel approach using an enhanced VQ-VAE framework optimized for unsupervised defect detection. Our model introduces a patch-aware dynamic code assignment scheme, enabling context-sensitive code allocation to optimize spatial representation. This strategy enhances normal-defect distinction and improves detection accuracy during inference. Experiments on MVTecAD, BTAD, and MTSD datasets show our method achieves state-of-the-art performance.
Paper Structure (24 sections, 5 figures, 3 tables)

This paper contains 24 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Concept of unsupervised defect detection using patch-aware codebook learning. Visual patterns of the normal class are encoded into a discrete codebook; then the code is dynamically allocated to each sample with appropriate representation capacity, addressing the dilemma in prior works. The learned normal allocations are also leveraged for defect detection.
  • Figure 2: Visualization of PVQAE model. It extends the vector quantization technique with 1) patch-aware codebook learning to optimize the representation capacity, i.e. budget, for each sample, and 2) normal budget prior learning step to record the regular code usage patterns.
  • Figure 3: Detailed architecture of Dynamic Routing Module.
  • Figure 4: Pixel-level defect detection samples on MVTecAD dataset.
  • Figure 5: Defect detection performance on MVTecAD vs. budget loss weight