RAID: Retrieval-Augmented Anomaly Detection
Mingxiu Cai, Zhe Zhang, Gaochang Wu, Tianyou Chai, Xiatian Zhu
TL;DR
This work reframes unsupervised anomaly detection as a retrieval-augmented problem, introducing RAID to leverage hierarchical, class-semantic-instance templates and a guided MoE-based generator to suppress matching noise. By building a coarse-to-fine retrieval pipeline and a two-stage filtering mechanism, RAID achieves robust pixel-level anomaly localization with strong generalization across full-shot, few-shot, and multi-dataset scenarios on MVTec-AD, VisA, MPDD, and BTAD. The approach demonstrates state-of-the-art performance, efficiency advantages from hierarchical retrieval, and broad applicability, including integration with reconstruction-based methods. Overall, RAID advances UAD by combining principled retrieval with noise-aware generation, enabling scalable, data-efficient industrial anomaly detection with precise localization.
Abstract
Unsupervised Anomaly Detection (UAD) aims to identify abnormal regions by establishing correspondences between test images and normal templates. Existing methods primarily rely on image reconstruction or template retrieval but face a fundamental challenge: matching between test images and normal templates inevitably introduces noise due to intra-class variations, imperfect correspondences, and limited templates. Observing that Retrieval-Augmented Generation (RAG) leverages retrieved samples directly in the generation process, we reinterpret UAD through this lens and introduce \textbf{RAID}, a retrieval-augmented UAD framework designed for noise-resilient anomaly detection and localization. Unlike standard RAG that enriches context or knowledge, we focus on using retrieved normal samples to guide noise suppression in anomaly map generation. RAID retrieves class-, semantic-, and instance-level representations from a hierarchical vector database, forming a coarse-to-fine pipeline. A matching cost volume correlates the input with retrieved exemplars, followed by a guided Mixture-of-Experts (MoE) network that leverages the retrieved samples to adaptively suppress matching noise and produce fine-grained anomaly maps. RAID achieves state-of-the-art performance across full-shot, few-shot, and multi-dataset settings on MVTec, VisA, MPDD, and BTAD benchmarks. \href{https://github.com/Mingxiu-Cai/RAID}{https://github.com/Mingxiu-Cai/RAID}.
