OmniAD: Detect and Understand Industrial Anomaly via Multimodal Reasoning

Shifang Zhao; Yiheng Lin; Lu Han; Yao Zhao; Yunchao Wei

OmniAD: Detect and Understand Industrial Anomaly via Multimodal Reasoning

Shifang Zhao, Yiheng Lin, Lu Han, Yao Zhao, Yunchao Wei

TL;DR

OmniAD tackles industrial anomaly understanding by unifying detection and reasoning in a multimodal framework. It converts segmentation into text generation with Text-as-Mask Encoding and uses Visual Guided Textual Reasoning to produce thorough analyses, trained with a combined SFT and GRPO regime. On MMAD and multiple anomaly-detection benchmarks, OmniAD achieves state-of-the-art or competitive results while eliminating the need for hand-tuned thresholds. The work advances practical industrial anomaly analysis by providing an explainable, few-shot capable system with publicly available code.

Abstract

While anomaly detection has made significant progress, generating detailed analyses that incorporate industrial knowledge remains a challenge. To address this gap, we introduce OmniAD, a novel framework that unifies anomaly detection and understanding for fine-grained analysis. OmniAD is a multimodal reasoner that combines visual and textual reasoning processes. The visual reasoning provides detailed inspection by leveraging Text-as-Mask Encoding to perform anomaly detection through text generation without manually selected thresholds. Following this, Visual Guided Textual Reasoning conducts comprehensive analysis by integrating visual perception. To enhance few-shot generalization, we employ an integrated training strategy that combines supervised fine-tuning (SFT) with reinforcement learning (GRPO), incorporating three sophisticated reward functions. Experimental results demonstrate that OmniAD achieves a performance of 79.1 on the MMAD benchmark, surpassing models such as Qwen2.5-VL-7B and GPT-4o. It also shows strong results across multiple anomaly detection benchmarks. These results highlight the importance of enhancing visual perception for effective reasoning in anomaly understanding. All codes and models will be publicly available.

OmniAD: Detect and Understand Industrial Anomaly via Multimodal Reasoning

TL;DR

Abstract

OmniAD: Detect and Understand Industrial Anomaly via Multimodal Reasoning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)