Kernel-Aware Graph Prompt Learning for Few-Shot Anomaly Detection
Fenfang Tao, Guo-Sen Xie, Fang Zhao, Xiangbo Shu
TL;DR
The paper tackles few-shot anomaly detection by moving beyond handcrafted text prompts and simple image adapters to exploit cross-layer visual context. It introduces KAG-prompt, which combines a kernel-aware hierarchical graph (KAHG) built from multi-kernel per-layer features with a memory-bank and a multi-information fusion (MIF) module to produce robust pixel- and image-level anomaly scores. Pixel maps from text alignment ($M_p$) and memory-based maps ($M_v$) are fused into $M$, while image-level scores combine a CLS-alignment term $s_1$ and a top-$k$ fusion term $s_2$ to yield final $s$. The approach achieves state-of-the-art FSAD performance on MVTecAD and VisA, with extensive ablations and visualizations confirming the contribution of cross-layer graph reasoning and multi-signal fusion, and it demonstrates strong practical potential for automated, low-data anomaly detection in industrial settings.
Abstract
Few-shot anomaly detection (FSAD) aims to detect unseen anomaly regions with the guidance of very few normal support images from the same class. Existing FSAD methods usually find anomalies by directly designing complex text prompts to align them with visual features under the prevailing large vision-language model paradigm. However, these methods, almost always, neglect intrinsic contextual information in visual features, e.g., the interaction relationships between different vision layers, which is an important clue for detecting anomalies comprehensively. To this end, we propose a kernel-aware graph prompt learning framework, termed as KAG-prompt, by reasoning the cross-layer relations among visual features for FSAD. Specifically, a kernel-aware hierarchical graph is built by taking the different layer features focusing on anomalous regions of different sizes as nodes, meanwhile, the relationships between arbitrary pairs of nodes stand for the edges of the graph. By message passing over this graph, KAG-prompt can capture cross-layer contextual information, thus leading to more accurate anomaly prediction. Moreover, to integrate the information of multiple important anomaly signals in the prediction map, we propose a novel image-level scoring method based on multi-level information fusion. Extensive experiments on MVTecAD and VisA datasets show that KAG-prompt achieves state-of-the-art FSAD results for image-level/pixel-level anomaly detection. Code is available at https://github.com/CVL-hub/KAG-prompt.git.
