Table of Contents
Fetching ...

StructCore: Structure-Aware Image-Level Scoring for Training-Free Unsupervised Anomaly Detection

Joongwon Chae, Lihui Luo, Yang Liu, Runming Wang, Dongmei Yu, Zeming Liang, Xi Yuan, Dayan Zhang, Zhenglin Chen, Peiwu Qin, Ilmoon Chae

TL;DR

StructCore is proposed, a training-free, structure-aware image-level scoring method that goes beyond max pooling, and achieves image-level AUROC scores of 99.6% on MVTec AD and 98.4% on VisA, demonstrating robust image-level anomaly detection by exploiting structural signatures missed by max pooling.

Abstract

Max pooling is the de facto standard for converting anomaly score maps into image-level decisions in memory-bank-based unsupervised anomaly detection (UAD). However, because it relies on a single extreme response, it discards most information about how anomaly evidence is distributed and structured across the image, often causing normal and anomalous scores to overlap. We propose StructCore, a training-free, structure-aware image-level scoring method that goes beyond max pooling. Given an anomaly score map, StructCore computes a low-dimensional structural descriptor phi(S) that captures distributional and spatial characteristics, and refines image-level scoring via a diagonal Mahalanobis calibration estimated from train-good samples, without modifying pixel-level localization. StructCore achieves image-level AUROC scores of 99.6% on MVTec AD and 98.4% on VisA, demonstrating robust image-level anomaly detection by exploiting structural signatures missed by max pooling.

StructCore: Structure-Aware Image-Level Scoring for Training-Free Unsupervised Anomaly Detection

TL;DR

StructCore is proposed, a training-free, structure-aware image-level scoring method that goes beyond max pooling, and achieves image-level AUROC scores of 99.6% on MVTec AD and 98.4% on VisA, demonstrating robust image-level anomaly detection by exploiting structural signatures missed by max pooling.

Abstract

Max pooling is the de facto standard for converting anomaly score maps into image-level decisions in memory-bank-based unsupervised anomaly detection (UAD). However, because it relies on a single extreme response, it discards most information about how anomaly evidence is distributed and structured across the image, often causing normal and anomalous scores to overlap. We propose StructCore, a training-free, structure-aware image-level scoring method that goes beyond max pooling. Given an anomaly score map, StructCore computes a low-dimensional structural descriptor phi(S) that captures distributional and spatial characteristics, and refines image-level scoring via a diagonal Mahalanobis calibration estimated from train-good samples, without modifying pixel-level localization. StructCore achieves image-level AUROC scores of 99.6% on MVTec AD and 98.4% on VisA, demonstrating robust image-level anomaly detection by exploiting structural signatures missed by max pooling.
Paper Structure (26 sections, 12 equations, 4 figures, 12 tables)

This paper contains 26 sections, 12 equations, 4 figures, 12 tables.

Figures (4)

  • Figure 1: Overview of the structcore framework. Normal images are encoded by a frozen DINOv2 ViT-B/14 backbone using multi-layer skip feature extraction. Patch features from multiple layers are concatenated and compressed via a fixed random projection, followed by greedy coreset selection to construct a category-specific memory bank. At inference time, an optional routing bank selects a relevant memory subset for efficient nearest-neighbor matching, producing an anomaly score map. While the base image-level score is obtained by conventional max pooling, StructCore augments it with a structure-aware score computed from a low-dimensional descriptor of the anomaly map, calibrated using statistics from train-good samples. StructCore refines image-level decisions without altering pixel-level localization.
  • Figure 2: Base vs. hybrid image scores illustrating where StructCore changes the image-level decision. The x-axis is the base score $S_{\mathrm{base}}=\max(S)$ and the y-axis is the final score $S_{\mathrm{hyb}}=S_{\mathrm{base}}+\lambda_{\mathrm{auto}}\,D_{\mathrm{struct}}$. Points are colored by ground-truth label. Thresholds $\tau_{\mathrm{base}}$ and $\tau_{\mathrm{hyb}}$ are set to the 99.5% quantile of train-good scores. Shaded cross-over regions mark samples whose decisions differ between $S_{\mathrm{base}}$ and $S_{\mathrm{hyb}}$ (counts are shown in each panel).
  • Figure 3: Qualitative results on MVTec AD (all categories).
  • Figure 4: Qualitative results on VisA (all categories).