Zero-Shot Industrial Anomaly Segmentation with Image-Aware Prompt Generation
SoYoung Park, Hyewon Lee, Mingyu Choi, Seunghoon Han, Jong-Ryul Lee, Sungsu Lim, Tae-Ho Kim
TL;DR
This work tackles industrial anomaly segmentation under zero-shot settings, where fixed prompts used by CLIP-based and SAM-based ZSAS methods hinder cross-domain adaptability. It proposes Image-Aware Prompt Segmentation (IAP-AS), which uses an image tagging model RAM and an LLM (LLaMA-3-8B) to generate context-aware adjectives that form adaptive prompts, combined with Grounding DINO for region localization and SAM for precise segmentation. The method defines the anomaly score as $Score_a=\sum_{i=1}^j s_i \cdot m_i$ and uses a size threshold $S_{\text{threshold}}$ to filter candidates, operating in a two-stage Preprocessing and Anomaly Segmentation pipeline evaluated on seven industrial datasets with AP and $F1$-max performance; it achieves up to 10% improvements in $F1$-max over competitive baselines and demonstrates strong cross-domain generalization without retraining. The authors release their code to support broader adoption and discuss future directions including optimized image-recognition prompts and deployment considerations for complex real-world environments.
Abstract
Anomaly segmentation is essential for industrial quality, maintenance, and stability. Existing text-guided zero-shot anomaly segmentation models are effective but rely on fixed prompts, limiting adaptability in diverse industrial scenarios. This highlights the need for flexible, context-aware prompting strategies. We propose Image-Aware Prompt Anomaly Segmentation (IAP-AS), which enhances anomaly segmentation by generating dynamic, context-aware prompts using an image tagging model and a large language model (LLM). IAP-AS extracts object attributes from images to generate context-aware prompts, improving adaptability and generalization in dynamic and unstructured industrial environments. In our experiments, IAP-AS improves the F1-max metric by up to 10%, demonstrating superior adaptability and generalization. It provides a scalable solution for anomaly segmentation across industries
