Table of Contents
Fetching ...

HLGFA: High-Low Resolution Guided Feature Alignment for Unsupervised Anomaly Detection

Han Zhou, Yuxuan Gao, Yinchao Du, Xuezhe Zheng

TL;DR

HLGFA addresses unsupervised industrial anomaly detection by exploiting cross-resolution feature consistency between high- and low-resolution views. It employs a frozen backbone and a structure-detail decoupled guidance mechanism to refine low-resolution features under high-resolution supervision, turning cross-resolution misalignment into anomaly cues. A noise-aware data augmentation strategy enhances robustness to nuisance industrial patterns. On the MVTec AD benchmark, HLGFA achieves strong pixel- and image-level AUROC and robust localization without reconstruction or memory banks, highlighting its practicality for real-world inspection and scalability to diverse defect types.

Abstract

Unsupervised industrial anomaly detection (UAD) is essential for modern manufacturing inspection, where defect samples are scarce and reliable detection is required. In this paper, we propose HLGFA, a high-low resolution guided feature alignment framework that learns normality by modeling cross-resolution feature consistency between high-resolution and low-resolution representations of normal samples, instead of relying on pixel-level reconstruction. Dual-resolution inputs are processed by a shared frozen backbone to extract multi-level features, and high-resolution representations are decomposed into structure and detail priors to guide the refinement of low-resolution features through conditional modulation and gated residual correction. During inference, anomalies are naturally identified as regions where cross-resolution alignment breaks down. In addition, a noise-aware data augmentation strategy is introduced to suppress nuisance-induced responses commonly observed in industrial environments. Extensive experiments on standard benchmarks demonstrate the effectiveness of HLGFA, achieving 97.9% pixel-level AUROC and 97.5% image-level AUROC on the MVTec AD dataset, outperforming representative reconstruction-based and feature-based methods.

HLGFA: High-Low Resolution Guided Feature Alignment for Unsupervised Anomaly Detection

TL;DR

HLGFA addresses unsupervised industrial anomaly detection by exploiting cross-resolution feature consistency between high- and low-resolution views. It employs a frozen backbone and a structure-detail decoupled guidance mechanism to refine low-resolution features under high-resolution supervision, turning cross-resolution misalignment into anomaly cues. A noise-aware data augmentation strategy enhances robustness to nuisance industrial patterns. On the MVTec AD benchmark, HLGFA achieves strong pixel- and image-level AUROC and robust localization without reconstruction or memory banks, highlighting its practicality for real-world inspection and scalability to diverse defect types.

Abstract

Unsupervised industrial anomaly detection (UAD) is essential for modern manufacturing inspection, where defect samples are scarce and reliable detection is required. In this paper, we propose HLGFA, a high-low resolution guided feature alignment framework that learns normality by modeling cross-resolution feature consistency between high-resolution and low-resolution representations of normal samples, instead of relying on pixel-level reconstruction. Dual-resolution inputs are processed by a shared frozen backbone to extract multi-level features, and high-resolution representations are decomposed into structure and detail priors to guide the refinement of low-resolution features through conditional modulation and gated residual correction. During inference, anomalies are naturally identified as regions where cross-resolution alignment breaks down. In addition, a noise-aware data augmentation strategy is introduced to suppress nuisance-induced responses commonly observed in industrial environments. Extensive experiments on standard benchmarks demonstrate the effectiveness of HLGFA, achieving 97.9% pixel-level AUROC and 97.5% image-level AUROC on the MVTec AD dataset, outperforming representative reconstruction-based and feature-based methods.
Paper Structure (17 sections, 10 equations, 6 figures, 3 tables)

This paper contains 17 sections, 10 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Visualization of feature responses extracted by a pretrained backbone under different resolutions. Normal samples show consistent activation patterns across high- and low-resolution views, while anomalous samples exhibit pronounced response shifts after resolution reduction due to the degradation of fine-grained structural cues.
  • Figure 2: High-resolution (HR) and low-resolution (LR) images are processed by a shared frozen backbone to extract multi-scale features.The learnable HLGFA module performs structure-guided refinement of low-resolution features using high-resolution representations.Anomalies are detected as regions where cross-resolution feature alignment fails.
  • Figure 3: Illustration of the proposed structure--detail decoupled guidance. High-resolution (HR) features are decomposed into a structure prior and a detail prior. The structure prior captures stable semantic layouts via multi-scale depthwise convolutions, while the detail prior preserves informative local cues through lightweight spatial alignment and channel projection, enabling stable cross-resolution guidance.
  • Figure 4: Visualization of the proposed structure--detail decoupled guidance and structure-based reliability modulation. HR and LR images are encoded into multi-scale features. During inference, anomaly maps derived from cross-resolution discrepancies are further modulated by a structure-based reliability weight, which suppresses spurious responses in structurally unstable regions.
  • Figure 5: The top row shows typical nuisance patterns commonly observed in defect-free products, including hairs, stains, cracks, and contamination noise. The bottom row illustrates our noise-aware augmentation strategy, where sparse point noise and structured stripe noise are synthetically injected into normal samples to simulate real-world contamination.
  • ...and 1 more figures