SAM Struggles in Concealed Scenes -- Empirical Study on Segment Anything
Ge-Peng Ji, Deng-Ping Fan, Peng Xu, Ming-Ming Cheng, Bowen Zhou, Luc Van Gool
TL;DR
This study evaluates the Segment Anything Model (SAM) on concealed-scene segmentation tasks, including camouflaged animals, industrial defects, and medical lesions, to reveal its unprompted segmentation behavior. It benchmarks SAM against transformer-based COS models on CAMO, COD10K, and NC4K, using an IoU-based mask selection strategy and standard segmentation metrics. Results show that while larger ViT backbones improve SAM’s scores, it remains substantially weaker than state-of-the-art COS methods, with notable qualitative failures in occluded and amorphous regions. The authors highlight limitations in open-set, high-precision contexts and suggest incorporating priors or domain knowledge as a path to improved performance, emphasizing the value of data-centric and knowledge-infused approaches for foundation models in vision.
Abstract
Segmenting anything is a ground-breaking step toward artificial general intelligence, and the Segment Anything Model (SAM) greatly fosters the foundation models for computer vision. We could not be more excited to probe the performance traits of SAM. In particular, exploring situations in which SAM does not perform well is interesting. In this report, we choose three concealed scenes, i.e., camouflaged animals, industrial defects, and medical lesions, to evaluate SAM under unprompted settings. Our main observation is that SAM looks unskilled in concealed scenes.
