Table of Contents
Fetching ...

AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly Detection

Yuhao Chao, Jie Liu, Jie Tang, Gangshan Wu

TL;DR

AnomalyR1 introduces an end-to-end multimodal large language model framework for industrial anomaly detection that leverages Group Relative Policy Optimization (GRPO) and a novel Reasoned Outcome Alignment Metric (ROAM) to overcome data scarcity. By building on VLM-R1 and outputting structured bounding-box information (later converted to masks by SAM2), the approach enables precise, explainable anomaly localization with minimal labeled data. ROAM complements GRPO by jointly evaluating the reasoning process and final outcome, yielding significant performance gains on the MMAD benchmark and strong generalization to unseen industrial datasets. The work demonstrates that a compact 3B model can achieve state-of-the-art results in multimodal IAD, highlighting the potential of end-to-end MLLMs for real-world industrial monitoring with limited defective samples.

Abstract

Industrial Anomaly Detection (IAD) poses a formidable challenge due to the scarcity of defective samples, making it imperative to deploy models capable of robust generalization to detect unseen anomalies effectively. Traditional approaches, often constrained by hand-crafted features or domain-specific expert models, struggle to address this limitation, underscoring the need for a paradigm shift. We introduce AnomalyR1, a pioneering framework that leverages VLM-R1, a Multimodal Large Language Model (MLLM) renowned for its exceptional generalization and interpretability, to revolutionize IAD. By integrating MLLM with Group Relative Policy Optimization (GRPO), enhanced by our novel Reasoned Outcome Alignment Metric (ROAM), AnomalyR1 achieves a fully end-to-end solution that autonomously processes inputs of image and domain knowledge, reasons through analysis, and generates precise anomaly localizations and masks. Based on the latest multimodal IAD benchmark, our compact 3-billion-parameter model outperforms existing methods, establishing state-of-the-art results. As MLLM capabilities continue to advance, this study is the first to deliver an end-to-end VLM-based IAD solution that demonstrates the transformative potential of ROAM-enhanced GRPO, positioning our framework as a forward-looking cornerstone for next-generation intelligent anomaly detection systems in industrial applications with limited defective data.

AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly Detection

TL;DR

AnomalyR1 introduces an end-to-end multimodal large language model framework for industrial anomaly detection that leverages Group Relative Policy Optimization (GRPO) and a novel Reasoned Outcome Alignment Metric (ROAM) to overcome data scarcity. By building on VLM-R1 and outputting structured bounding-box information (later converted to masks by SAM2), the approach enables precise, explainable anomaly localization with minimal labeled data. ROAM complements GRPO by jointly evaluating the reasoning process and final outcome, yielding significant performance gains on the MMAD benchmark and strong generalization to unseen industrial datasets. The work demonstrates that a compact 3B model can achieve state-of-the-art results in multimodal IAD, highlighting the potential of end-to-end MLLMs for real-world industrial monitoring with limited defective samples.

Abstract

Industrial Anomaly Detection (IAD) poses a formidable challenge due to the scarcity of defective samples, making it imperative to deploy models capable of robust generalization to detect unseen anomalies effectively. Traditional approaches, often constrained by hand-crafted features or domain-specific expert models, struggle to address this limitation, underscoring the need for a paradigm shift. We introduce AnomalyR1, a pioneering framework that leverages VLM-R1, a Multimodal Large Language Model (MLLM) renowned for its exceptional generalization and interpretability, to revolutionize IAD. By integrating MLLM with Group Relative Policy Optimization (GRPO), enhanced by our novel Reasoned Outcome Alignment Metric (ROAM), AnomalyR1 achieves a fully end-to-end solution that autonomously processes inputs of image and domain knowledge, reasons through analysis, and generates precise anomaly localizations and masks. Based on the latest multimodal IAD benchmark, our compact 3-billion-parameter model outperforms existing methods, establishing state-of-the-art results. As MLLM capabilities continue to advance, this study is the first to deliver an end-to-end VLM-based IAD solution that demonstrates the transformative potential of ROAM-enhanced GRPO, positioning our framework as a forward-looking cornerstone for next-generation intelligent anomaly detection systems in industrial applications with limited defective data.

Paper Structure

This paper contains 20 sections, 6 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: The process of group relative policy optimization: The model generates a set of responses for each prompt, scores them using a reward model, and updates its parameters based on the relative advantages within the group.
  • Figure 2: The structure of AnomalyR1, the model enhanced by GRPO training, is able to finish the whole end-to-end process, which shows the new paradigm for IAD tasks.
  • Figure 3: GRPO with ROAM (AnomalyR1) shows a better reasoning process and gives out the correct answer, while classical GRPO shows a conflict between CoT and answer.