Table of Contents
Fetching ...

Zero-Shot Industrial Anomaly Segmentation with Image-Aware Prompt Generation

SoYoung Park, Hyewon Lee, Mingyu Choi, Seunghoon Han, Jong-Ryul Lee, Sungsu Lim, Tae-Ho Kim

TL;DR

This work tackles industrial anomaly segmentation under zero-shot settings, where fixed prompts used by CLIP-based and SAM-based ZSAS methods hinder cross-domain adaptability. It proposes Image-Aware Prompt Segmentation (IAP-AS), which uses an image tagging model RAM and an LLM (LLaMA-3-8B) to generate context-aware adjectives that form adaptive prompts, combined with Grounding DINO for region localization and SAM for precise segmentation. The method defines the anomaly score as $Score_a=\sum_{i=1}^j s_i \cdot m_i$ and uses a size threshold $S_{\text{threshold}}$ to filter candidates, operating in a two-stage Preprocessing and Anomaly Segmentation pipeline evaluated on seven industrial datasets with AP and $F1$-max performance; it achieves up to 10% improvements in $F1$-max over competitive baselines and demonstrates strong cross-domain generalization without retraining. The authors release their code to support broader adoption and discuss future directions including optimized image-recognition prompts and deployment considerations for complex real-world environments.

Abstract

Anomaly segmentation is essential for industrial quality, maintenance, and stability. Existing text-guided zero-shot anomaly segmentation models are effective but rely on fixed prompts, limiting adaptability in diverse industrial scenarios. This highlights the need for flexible, context-aware prompting strategies. We propose Image-Aware Prompt Anomaly Segmentation (IAP-AS), which enhances anomaly segmentation by generating dynamic, context-aware prompts using an image tagging model and a large language model (LLM). IAP-AS extracts object attributes from images to generate context-aware prompts, improving adaptability and generalization in dynamic and unstructured industrial environments. In our experiments, IAP-AS improves the F1-max metric by up to 10%, demonstrating superior adaptability and generalization. It provides a scalable solution for anomaly segmentation across industries

Zero-Shot Industrial Anomaly Segmentation with Image-Aware Prompt Generation

TL;DR

This work tackles industrial anomaly segmentation under zero-shot settings, where fixed prompts used by CLIP-based and SAM-based ZSAS methods hinder cross-domain adaptability. It proposes Image-Aware Prompt Segmentation (IAP-AS), which uses an image tagging model RAM and an LLM (LLaMA-3-8B) to generate context-aware adjectives that form adaptive prompts, combined with Grounding DINO for region localization and SAM for precise segmentation. The method defines the anomaly score as and uses a size threshold to filter candidates, operating in a two-stage Preprocessing and Anomaly Segmentation pipeline evaluated on seven industrial datasets with AP and -max performance; it achieves up to 10% improvements in -max over competitive baselines and demonstrates strong cross-domain generalization without retraining. The authors release their code to support broader adoption and discuss future directions including optimized image-recognition prompts and deployment considerations for complex real-world environments.

Abstract

Anomaly segmentation is essential for industrial quality, maintenance, and stability. Existing text-guided zero-shot anomaly segmentation models are effective but rely on fixed prompts, limiting adaptability in diverse industrial scenarios. This highlights the need for flexible, context-aware prompting strategies. We propose Image-Aware Prompt Anomaly Segmentation (IAP-AS), which enhances anomaly segmentation by generating dynamic, context-aware prompts using an image tagging model and a large language model (LLM). IAP-AS extracts object attributes from images to generate context-aware prompts, improving adaptability and generalization in dynamic and unstructured industrial environments. In our experiments, IAP-AS improves the F1-max metric by up to 10%, demonstrating superior adaptability and generalization. It provides a scalable solution for anomaly segmentation across industries

Paper Structure

This paper contains 25 sections, 4 equations, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: Example of the text prompt generation process for an industrial image, where object tags extracted from the image are combined with Image-Aware Prompt (IAP) and processed by an LLM to create context-aware prompts for anomaly segmentation.
  • Figure 2: Overview of the proposed IAP-AS framework, which operates in two stages: Preprocessing and Anomaly Segmentation. The Preprocessing stage includes image tagging, size threshold extraction, and LLM-based prompt generation. The Anomaly Segmentation stage involves anomaly region detection, filtering, segmentation, and anomaly score computation.
  • Figure 3: Visual comparison of IAP-AS and other models across four datasets, highlighting differences in anomaly segmentation results.