Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning
Hui-Yue Yang, Hui Chen, Ao Wang, Kai Chen, Zijia Lin, Yongliang Tang, Pengcheng Gao, Yuming Quan, Jungong Han, Guiguang Ding
TL;DR
This work tackles the domain shift problem when applying the Segment Anything Model (SAM) to industrial anomaly segmentation. The authors introduce Self-Perception Tuning (SPT), which comprises Self-Draft Tuning (SDT) — a draft-then-refine mask generation pipeline — and a Visual-Relation-Aware Adapter (VRA-Adapter) to inject relational information into decoding. By fine-tuning SAM with SDT and VRA-Adapter, the method achieves state-of-the-art performance across six industrial benchmarks, under multiple prompting modes, with consistent gains over zero-shot SAM and traditional PEFT baselines. The approach enhances SAM's perception of anomalies and their relationships, improving robustness and practical applicability for automated industrial inspection and quality control.
Abstract
Segment Anything Model (SAM) has made great progress in anomaly segmentation tasks due to its impressive generalization ability. However, existing methods that directly apply SAM through prompting often overlook the domain shift issue, where SAM performs well on natural images but struggles in industrial scenarios. Parameter-Efficient Fine-Tuning (PEFT) offers a promising solution, but it may yield suboptimal performance by not adequately addressing the perception challenges during adaptation to anomaly images. In this paper, we propose a novel \textbf{S}elf-\textbf{P}erceptinon \textbf{T}uning (\textbf{SPT}) method, aiming to enhance SAM's perception capability for anomaly segmentation. The SPT method incorporates a self-drafting tuning strategy, which generates an initial coarse draft of the anomaly mask, followed by a refinement process. Additionally, a visual-relation-aware adapter is introduced to improve the perception of discriminative relational information for mask generation. Extensive experimental results on several benchmark datasets demonstrate that our SPT method can significantly outperform baseline methods, validating its effectiveness.
