Quantifying the Limits of Segmentation Foundation Models: Modeling Challenges in Segmenting Tree-Like and Low-Contrast Objects
Yixin Zhang, Nicholas Konz, Kevin Kramer, Maciej A. Mazurowski
TL;DR
This paper identifies fundamental failure modes of segmentation foundation models (SFMs) when handling tree-like and low-contrast objects. It introduces two interpretable metrics, Contour Pixel Rate and Difference of Gini Impurity Deviation, to quantify object tree-likeness, and a textural separability metric based on early neural features to quantify texture differences with the background. Across carefully controlled synthetic and real datasets, the study shows strong, consistent correlations between these object characteristics and SFM IoU, and finds that fine-tuning the models does not eliminate the problems. The findings reveal that SFMs tend to over-segment or misclassify tree-like and low-texture objects due to how patch-based attention interprets local structure as texture, with important implications for model design and evaluation in applications requiring robust segmentation of complex shapes and textures.
Abstract
Image segmentation foundation models (SFMs) like Segment Anything Model (SAM) have achieved impressive zero-shot and interactive segmentation across diverse domains. However, they struggle to segment objects with certain structures, particularly those with dense, tree-like morphology and low textural contrast from their surroundings. These failure modes are crucial for understanding the limitations of SFMs in real-world applications. To systematically study this issue, we introduce interpretable metrics quantifying object tree-likeness and textural separability. On carefully controlled synthetic experiments and real-world datasets, we show that SFM performance (\eg, SAM, SAM 2, HQ-SAM) noticeably correlates with these factors. We attribute these failures to SFMs misinterpreting local structure as global texture, resulting in over-segmentation or difficulty distinguishing objects from similar backgrounds. Notably, targeted fine-tuning fails to resolve this issue, indicating a fundamental limitation. Our study provides the first quantitative framework for modeling the behavior of SFMs on challenging structures, offering interpretable insights into their segmentation capabilities.
