Reason-IAD: Knowledge-Guided Dynamic Latent Reasoning for Explainable Industrial Anomaly Detection
Peng Chen, Chao Huang, Yunkang Cao, Chengliang Liu, Wenqiang Wang, Mingbo Yang, Li Shen, Wenqi Ren, Xiaochun Cao
TL;DR
Reason-IAD addresses the challenge of explainable industrial anomaly detection by combining retrieval-augmented category-specific knowledge with entropy-guided latent reasoning in a multimodal LLM framework. It retrieves top-$k$ category descriptions to condition the model prompt and employs a compact latent space of think tokens $\mathcal{Z}$, updated via a reward based on predictive entropy $\mathcal{H}$ using a REINFORCE-like objective, while dynamically injecting informative visual patches. A dynamic visual injection mechanism concentrates attention on defect-relevant regions, guided by iteration-based rewards to improve evidence-based reasoning. On the MMAD benchmark across seven subtasks, Reason-IAD achieves strong one-shot and zero-shot performance, surpassing several baselines and demonstrating both high accuracy and interpretable reasoning for industrial defects.
Abstract
Industrial anomaly detection demands precise reasoning over fine-grained defect patterns. However, existing multimodal large language models (MLLMs), pretrained on general-domain data, often struggle to capture category-specific anomalies, thereby limiting both detection accuracy and interpretability. To address these limitations, we propose Reason-IAD, a knowledge-guided dynamic latent reasoning framework for explainable industrial anomaly detection. Reason-IAD comprises two core components. First, a retrieval-augmented knowledge module incorporates category-specific textual descriptions into the model input, enabling context-aware reasoning over domain-specific defects. Second, an entropy-driven latent reasoning mechanism conducts iterative exploration within a compact latent space using optimizable latent think tokens, guided by an entropy-based reward that encourages confident and stable predictions. Furthermore, a dynamic visual injection strategy selectively incorporates the most informative image patches into the latent sequence, directing the reasoning process toward regions critical for anomaly detection. Extensive experimental results demonstrate that Reason-IAD consistently outperforms state-of-the-art methods. The code will be publicly available at https://github.com/chenpeng052/Reason-IAD.
