Commonality in Few: Few-Shot Multimodal Anomaly Detection via Hypergraph-Enhanced Memory
Yuxuan Lin, Hanjing Yan, Xuan Tong, Yang Chang, Huanzhen Wang, Ziheng Zhou, Shuyong Gao, Yan Wang, Wenqiang Zhang
TL;DR
This work tackles few-shot multimodal industrial anomaly detection by introducing CIF, a hypergraph-based framework that extracts intra-class structural commonality from limited normal examples. CIF constructs semantic-aware hypergraphs to guide memory-bank formation (SGMS) and employs a training-free bidirectional hypergraph message passing (Bi-TF-MP) to align test features with memory features, followed by hyperedge-guided memory search (HGMS) to reduce false positives. Across MVTec 3D-AD and Eyecandies, CIF achieves state-of-the-art I-AUROC in 1-, 2-, and 4-shot settings, demonstrating strong transferability and robustness with limited data. The approach emphasizes structured, higher-order relationships to improve few-shot detection and localization in industrial contexts, with publicly available code for replication and extension.
Abstract
Few-shot multimodal industrial anomaly detection is a critical yet underexplored task, offering the ability to quickly adapt to complex industrial scenarios. In few-shot settings, insufficient training samples often fail to cover the diverse patterns present in test samples. This challenge can be mitigated by extracting structural commonality from a small number of training samples. In this paper, we propose a novel few-shot unsupervised multimodal industrial anomaly detection method based on structural commonality, CIF (Commonality In Few). To extract intra-class structural information, we employ hypergraphs, which are capable of modeling higher-order correlations, to capture the structural commonality within training samples, and use a memory bank to store this intra-class structural prior. Firstly, we design a semantic-aware hypergraph construction module tailored for single-semantic industrial images, from which we extract common structures to guide the construction of the memory bank. Secondly, we use a training-free hypergraph message passing module to update the visual features of test samples, reducing the distribution gap between test features and features in the memory bank. We further propose a hyperedge-guided memory search module, which utilizes structural information to assist the memory search process and reduce the false positive rate. Experimental results on the MVTec 3D-AD dataset and the Eyecandies dataset show that our method outperforms the state-of-the-art (SOTA) methods in few-shot settings. Code is available at https://github.com/Sunny5250/CIF.
