GATE-AD: Graph Attention Network Encoding For Few-Shot Industrial Visual Anomaly Detection

Aggelos Psiris; Yannis Panagakis; Maria Vakalopoulou; Georgios Th. Papadopoulos

GATE-AD: Graph Attention Network Encoding For Few-Shot Industrial Visual Anomaly Detection

Aggelos Psiris, Yannis Panagakis, Maria Vakalopoulou, Georgios Th. Papadopoulos

Abstract

Few-Shot Industrial Visual Anomaly Detection (FS-IVAD) comprises a critical task in modern manufacturing settings, where automated product inspection systems need to identify rare defects using only a handful of normal/defect-free training samples. In this context, the current study introduces a novel reconstruction-based approach termed GATE-AD. In particular, the proposed framework relies on the employment of a masked, representation-aligned Graph Attention Network (GAT) encoding scheme to learn robust appearance patterns of normal samples. By leveraging dense, patch-level, visual feature tokens as graph nodes, the model employs stacked self-attentional layers to adaptively encode complex, irregular, non-Euclidean, local relations. The graph is enhanced with a representation alignment component grounded on a learnable, latent space, where high reconstruction residual areas (i.e., defects) are assessed using a Scaled Cosine Error (SCE) objective function. Extensive comparative evaluation on the MVTec AD, VisA, and MPDD industrial defect detection benchmarks demonstrates that GATE-AD achieves state-of-the-art performance across the $1$- to $8$-shot settings, combining the highest detection accuracy (increase up to $1.8\%$ in image AUROC in the 8-shot case in MPDD) with the lowest per-image inference latency (at least $25.05\%$ faster), compared to the best-performing literature methods. In order to facilitate reproducibility and further research, the source code of GATE-AD is available at https://github.com/gthpapadopoulos/GATE-AD.

GATE-AD: Graph Attention Network Encoding For Few-Shot Industrial Visual Anomaly Detection

Abstract

- to

-shot settings, combining the highest detection accuracy (increase up to

in image AUROC in the 8-shot case in MPDD) with the lowest per-image inference latency (at least

faster), compared to the best-performing literature methods. In order to facilitate reproducibility and further research, the source code of GATE-AD is available at https://github.com/gthpapadopoulos/GATE-AD.

Paper Structure (51 sections, 5 equations, 17 figures, 6 tables)

This paper contains 51 sections, 5 equations, 17 figures, 6 tables.

Introduction
Related Work
Visual feature embeddings and memory banks.
Vision-language model reasoning and adaptation.
Reconstruction-based frameworks and prototype refinement.
Proposed approach
Overview
Image patchification and feature extraction
Masked graph attention network encoder
Encoder type selection.
Encoder network architecture.
Graph attentional layer.
Representation alignment.
Feature masking.
Graph regularization.
...and 36 more sections

Figures (17)

Figure 1: General architecture of the proposed masked, representation-aligned GAT encoding scheme for FS-IVAD.
Figure 2: Indicative $1$-shot defect detection results. GATE-AD localizes more accurately and robustly defects, across multiple pose, placement, and abnormality type settings.
Figure 3: FS-IVAD detection performance (image-level AUROC) versus per-image inference time (ms), under the $1$-shot setting on the MVTec AD benchmark.
Figure 4: Effect of the input image resolution on the MVTec AD and VisA benchmarks.
Figure 5: Effect of the backbone on the MVTec AD, VisA, and MPDD benchmarks.
...and 12 more figures

GATE-AD: Graph Attention Network Encoding For Few-Shot Industrial Visual Anomaly Detection

Abstract

GATE-AD: Graph Attention Network Encoding For Few-Shot Industrial Visual Anomaly Detection

Authors

Abstract

Table of Contents

Figures (17)