Table of Contents
Fetching ...

Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning

Hui-Yue Yang, Hui Chen, Ao Wang, Kai Chen, Zijia Lin, Yongliang Tang, Pengcheng Gao, Yuming Quan, Jungong Han, Guiguang Ding

TL;DR

This work tackles the domain shift problem when applying the Segment Anything Model (SAM) to industrial anomaly segmentation. The authors introduce Self-Perception Tuning (SPT), which comprises Self-Draft Tuning (SDT) — a draft-then-refine mask generation pipeline — and a Visual-Relation-Aware Adapter (VRA-Adapter) to inject relational information into decoding. By fine-tuning SAM with SDT and VRA-Adapter, the method achieves state-of-the-art performance across six industrial benchmarks, under multiple prompting modes, with consistent gains over zero-shot SAM and traditional PEFT baselines. The approach enhances SAM's perception of anomalies and their relationships, improving robustness and practical applicability for automated industrial inspection and quality control.

Abstract

Segment Anything Model (SAM) has made great progress in anomaly segmentation tasks due to its impressive generalization ability. However, existing methods that directly apply SAM through prompting often overlook the domain shift issue, where SAM performs well on natural images but struggles in industrial scenarios. Parameter-Efficient Fine-Tuning (PEFT) offers a promising solution, but it may yield suboptimal performance by not adequately addressing the perception challenges during adaptation to anomaly images. In this paper, we propose a novel \textbf{S}elf-\textbf{P}erceptinon \textbf{T}uning (\textbf{SPT}) method, aiming to enhance SAM's perception capability for anomaly segmentation. The SPT method incorporates a self-drafting tuning strategy, which generates an initial coarse draft of the anomaly mask, followed by a refinement process. Additionally, a visual-relation-aware adapter is introduced to improve the perception of discriminative relational information for mask generation. Extensive experimental results on several benchmark datasets demonstrate that our SPT method can significantly outperform baseline methods, validating its effectiveness.

Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning

TL;DR

This work tackles the domain shift problem when applying the Segment Anything Model (SAM) to industrial anomaly segmentation. The authors introduce Self-Perception Tuning (SPT), which comprises Self-Draft Tuning (SDT) — a draft-then-refine mask generation pipeline — and a Visual-Relation-Aware Adapter (VRA-Adapter) to inject relational information into decoding. By fine-tuning SAM with SDT and VRA-Adapter, the method achieves state-of-the-art performance across six industrial benchmarks, under multiple prompting modes, with consistent gains over zero-shot SAM and traditional PEFT baselines. The approach enhances SAM's perception of anomalies and their relationships, improving robustness and practical applicability for automated industrial inspection and quality control.

Abstract

Segment Anything Model (SAM) has made great progress in anomaly segmentation tasks due to its impressive generalization ability. However, existing methods that directly apply SAM through prompting often overlook the domain shift issue, where SAM performs well on natural images but struggles in industrial scenarios. Parameter-Efficient Fine-Tuning (PEFT) offers a promising solution, but it may yield suboptimal performance by not adequately addressing the perception challenges during adaptation to anomaly images. In this paper, we propose a novel \textbf{S}elf-\textbf{P}erceptinon \textbf{T}uning (\textbf{SPT}) method, aiming to enhance SAM's perception capability for anomaly segmentation. The SPT method incorporates a self-drafting tuning strategy, which generates an initial coarse draft of the anomaly mask, followed by a refinement process. Additionally, a visual-relation-aware adapter is introduced to improve the perception of discriminative relational information for mask generation. Extensive experimental results on several benchmark datasets demonstrate that our SPT method can significantly outperform baseline methods, validating its effectiveness.

Paper Structure

This paper contains 45 sections, 15 equations, 12 figures, 11 tables.

Figures (12)

  • Figure 1: Illustration of the domain shift issue. SAM performs well on natural images but poorly on out-of-domain industrial anomaly images.
  • Figure 2: A comparison between SAM with PEFT methods and our promptable anomaly segmentation model with self-perception tuning. VRA denotes the VRA-Adapter.
  • Figure 3: Overview of the proposed Self-Perception Tuning (SPT) framework, which applies a self-draft tuning (SDT) strategy and visual-relation-aware adapters (VRA-Adapter) to enhance the perception ability of SAM. SDT consists of three phases, i.e., display, draft, and refine. VRA is an abbreviation for VRA-Adapter.
  • Figure 4: Examples for comparison among components in SPT. We use $\mathbf{D}_{\text{draft}}$ and $\mathbf{D}_{\text{refine}}$ for analysing SDT and SPT for analysing VRA-Adapter. GT denotes the ground truth.
  • Figure 5: Qualitative analysis using different prompts. We provide examples with box-level prompts (left) and point-level prompts (right). GT means ground truth.
  • ...and 7 more figures