Table of Contents
Fetching ...

GATE-AD: Graph Attention Network Encoding For Few-Shot Industrial Visual Anomaly Detection

Aggelos Psiris, Yannis Panagakis, Maria Vakalopoulou, Georgios Th. Papadopoulos

Abstract

Few-Shot Industrial Visual Anomaly Detection (FS-IVAD) comprises a critical task in modern manufacturing settings, where automated product inspection systems need to identify rare defects using only a handful of normal/defect-free training samples. In this context, the current study introduces a novel reconstruction-based approach termed GATE-AD. In particular, the proposed framework relies on the employment of a masked, representation-aligned Graph Attention Network (GAT) encoding scheme to learn robust appearance patterns of normal samples. By leveraging dense, patch-level, visual feature tokens as graph nodes, the model employs stacked self-attentional layers to adaptively encode complex, irregular, non-Euclidean, local relations. The graph is enhanced with a representation alignment component grounded on a learnable, latent space, where high reconstruction residual areas (i.e., defects) are assessed using a Scaled Cosine Error (SCE) objective function. Extensive comparative evaluation on the MVTec AD, VisA, and MPDD industrial defect detection benchmarks demonstrates that GATE-AD achieves state-of-the-art performance across the $1$- to $8$-shot settings, combining the highest detection accuracy (increase up to $1.8\%$ in image AUROC in the 8-shot case in MPDD) with the lowest per-image inference latency (at least $25.05\%$ faster), compared to the best-performing literature methods. In order to facilitate reproducibility and further research, the source code of GATE-AD is available at https://github.com/gthpapadopoulos/GATE-AD.

GATE-AD: Graph Attention Network Encoding For Few-Shot Industrial Visual Anomaly Detection

Abstract

Few-Shot Industrial Visual Anomaly Detection (FS-IVAD) comprises a critical task in modern manufacturing settings, where automated product inspection systems need to identify rare defects using only a handful of normal/defect-free training samples. In this context, the current study introduces a novel reconstruction-based approach termed GATE-AD. In particular, the proposed framework relies on the employment of a masked, representation-aligned Graph Attention Network (GAT) encoding scheme to learn robust appearance patterns of normal samples. By leveraging dense, patch-level, visual feature tokens as graph nodes, the model employs stacked self-attentional layers to adaptively encode complex, irregular, non-Euclidean, local relations. The graph is enhanced with a representation alignment component grounded on a learnable, latent space, where high reconstruction residual areas (i.e., defects) are assessed using a Scaled Cosine Error (SCE) objective function. Extensive comparative evaluation on the MVTec AD, VisA, and MPDD industrial defect detection benchmarks demonstrates that GATE-AD achieves state-of-the-art performance across the - to -shot settings, combining the highest detection accuracy (increase up to in image AUROC in the 8-shot case in MPDD) with the lowest per-image inference latency (at least faster), compared to the best-performing literature methods. In order to facilitate reproducibility and further research, the source code of GATE-AD is available at https://github.com/gthpapadopoulos/GATE-AD.
Paper Structure (51 sections, 5 equations, 17 figures, 6 tables)

This paper contains 51 sections, 5 equations, 17 figures, 6 tables.

Figures (17)

  • Figure 1: General architecture of the proposed masked, representation-aligned GAT encoding scheme for FS-IVAD.
  • Figure 2: Indicative $1$-shot defect detection results. GATE-AD localizes more accurately and robustly defects, across multiple pose, placement, and abnormality type settings.
  • Figure 3: FS-IVAD detection performance (image-level AUROC) versus per-image inference time (ms), under the $1$-shot setting on the MVTec AD benchmark.
  • Figure 4: Effect of the input image resolution on the MVTec AD and VisA benchmarks.
  • Figure 5: Effect of the backbone on the MVTec AD, VisA, and MPDD benchmarks.
  • ...and 12 more figures