EAGLE: Expert-Augmented Attention Guidance for Tuning-Free Industrial Anomaly Detection in Multimodal Large Language Models
Xiaomeng Peng, Xilang Huang, Seon Han Choi
TL;DR
This work tackles industrial anomaly detection by removing the need for expensive re-training of multimodal language models. It introduces EAGLE, a tuning-free framework that couples a PatchCore-based expert with frozen MLLMs, using Distribution-Based Thresholding to selectively inject expert prompts and Confidence-Aware Attention Scaling to mitigate reliance on potentially wrong textual priors. Across MVTec-AD and VisA, EAGLE consistently improves detection accuracy and recall across multiple backbones, achieving competitive results with fine-tuned baselines. The study also reveals a strong link between correct predictions and focused attention on true defect regions, suggesting that expert-guided prompting can enhance both performance and interpretability in industrial anomaly detection.
Abstract
Industrial anomaly detection is important for smart manufacturing, but many deep learning approaches produce only binary decisions and provide limited semantic explanations. Multimodal large language models (MLLMs) can potentially generate fine-grained, language-based analyses, yet existing methods often require costly fine-tuning and do not consistently improve anomaly detection accuracy compared to lightweight specialist detectors. We propose expert-augmented attention guidance for industrial anomaly detection in MLLMs (EAGLE), a tuning-free framework that integrates outputs from expert model to guide MLLMs toward both accurate detection and interpretable anomaly descriptions. We further study how EAGLE affects MLLMs internals by examining the attention distribution of MLLMs to the anomalous image regions in the intermediate layers. We observe that successful anomaly detection is associated with increased attention concentration on anomalous regions, and EAGLE tends to encourage this alignment. Experiments on MVTec-AD and VisA show that EAGLE improves anomaly detection performance across multiple MLLMs without any parameter updates, achieving results comparable to fine-tuning based methods. Code is available at \href{https://github.com/shengtun/Eagle}{https://github.com/shengtun/Eagle}
