Table of Contents
Fetching ...

SAM Struggles in Concealed Scenes -- Empirical Study on Segment Anything

Ge-Peng Ji, Deng-Ping Fan, Peng Xu, Ming-Ming Cheng, Bowen Zhou, Luc Van Gool

TL;DR

This study evaluates the Segment Anything Model (SAM) on concealed-scene segmentation tasks, including camouflaged animals, industrial defects, and medical lesions, to reveal its unprompted segmentation behavior. It benchmarks SAM against transformer-based COS models on CAMO, COD10K, and NC4K, using an IoU-based mask selection strategy and standard segmentation metrics. Results show that while larger ViT backbones improve SAM’s scores, it remains substantially weaker than state-of-the-art COS methods, with notable qualitative failures in occluded and amorphous regions. The authors highlight limitations in open-set, high-precision contexts and suggest incorporating priors or domain knowledge as a path to improved performance, emphasizing the value of data-centric and knowledge-infused approaches for foundation models in vision.

Abstract

Segmenting anything is a ground-breaking step toward artificial general intelligence, and the Segment Anything Model (SAM) greatly fosters the foundation models for computer vision. We could not be more excited to probe the performance traits of SAM. In particular, exploring situations in which SAM does not perform well is interesting. In this report, we choose three concealed scenes, i.e., camouflaged animals, industrial defects, and medical lesions, to evaluate SAM under unprompted settings. Our main observation is that SAM looks unskilled in concealed scenes.

SAM Struggles in Concealed Scenes -- Empirical Study on Segment Anything

TL;DR

This study evaluates the Segment Anything Model (SAM) on concealed-scene segmentation tasks, including camouflaged animals, industrial defects, and medical lesions, to reveal its unprompted segmentation behavior. It benchmarks SAM against transformer-based COS models on CAMO, COD10K, and NC4K, using an IoU-based mask selection strategy and standard segmentation metrics. Results show that while larger ViT backbones improve SAM’s scores, it remains substantially weaker than state-of-the-art COS methods, with notable qualitative failures in occluded and amorphous regions. The authors highlight limitations in open-set, high-precision contexts and suggest incorporating priors or domain knowledge as a path to improved performance, emphasizing the value of data-centric and knowledge-infused approaches for foundation models in vision.

Abstract

Segmenting anything is a ground-breaking step toward artificial general intelligence, and the Segment Anything Model (SAM) greatly fosters the foundation models for computer vision. We could not be more excited to probe the performance traits of SAM. In particular, exploring situations in which SAM does not perform well is interesting. In this report, we choose three concealed scenes, i.e., camouflaged animals, industrial defects, and medical lesions, to evaluate SAM under unprompted settings. Our main observation is that SAM looks unskilled in concealed scenes.
Paper Structure (8 sections, 3 figures, 1 table)

This paper contains 8 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: SAM kirillov2023segment fails to perceive the animals that are visually "hidden" in their natural surroundings. All the samples are from COD10K dataset fan2020camouflaged.
  • Figure 2: SAM kirillov2023segment is unskilled in detecting concealed defects in industrial scenes. These samples are taken from KolektorSDD tabernik2020segmentation, MagneticTile huang2020surface, and MVTecAD bergmann2021mvtec datasets.
  • Figure 3: SAM kirillov2023segment fails to detect these lesion regions in various medical modalities. These samples cover the RGB colour modality from CVC-300 bernal2012towards (1st column); the MRI modality from BraTS2021 baid2021rsna (2nd column); the CT modalities from COVID-SemiSeg fan2020infnet (3rd column) and MSD antonelli2022medical (from 4th to 7th columns).