Table of Contents
Fetching ...

TextureSAM: Towards a Texture Aware Foundation Model for Segmentation

Inbal Cohen, Boaz Meivar, Peihan Tu, Shai Avidan, Gal Oren

TL;DR

TextureSAM addresses the texture bias in Segmentation Anything Models by fine-tuning SAM-2 on a texture-augmented ADE20K derived via compositional neural texture, enabling texture-driven region delineation. The approach introduces Textured-ADE20K using CNT texture transfer from DTD, and explores mild versus strong augmentation, reporting improvements on texture-centric benchmarks (RWTD and STMD) while analyzing trade-offs on ADE20K semantic segmentation. Key findings show TextureSAM reduces fragmentation and better captures texture-defined boundaries, with moderate augmentation offering a practical balance between texture sensitivity and general segmentation. The work advances texture-aware foundation segmentation and provides a texture-augmented dataset and code, enabling broader testing across domains where texture is the primary cue for boundaries.

Abstract

Segment Anything Models (SAM) have achieved remarkable success in object segmentation tasks across diverse datasets. However, these models are predominantly trained on large-scale semantic segmentation datasets, which introduce a bias toward object shape rather than texture cues in the image. This limitation is critical in domains such as medical imaging, material classification, and remote sensing, where texture changes define object boundaries. In this study, we investigate SAM's bias toward semantics over textures and introduce a new texture-aware foundation model, TextureSAM, which performs superior segmentation in texture-dominant scenarios. To achieve this, we employ a novel fine-tuning approach that incorporates texture augmentation techniques, incrementally modifying training images to emphasize texture features. By leveraging a novel texture-alternation of the ADE20K dataset, we guide TextureSAM to prioritize texture-defined regions, thereby mitigating the inherent shape bias present in the original SAM model. Our extensive experiments demonstrate that TextureSAM significantly outperforms SAM-2 on both natural (+0.2 mIoU) and synthetic (+0.18 mIoU) texture-based segmentation datasets. The code and texture-augmented dataset will be publicly available.

TextureSAM: Towards a Texture Aware Foundation Model for Segmentation

TL;DR

TextureSAM addresses the texture bias in Segmentation Anything Models by fine-tuning SAM-2 on a texture-augmented ADE20K derived via compositional neural texture, enabling texture-driven region delineation. The approach introduces Textured-ADE20K using CNT texture transfer from DTD, and explores mild versus strong augmentation, reporting improvements on texture-centric benchmarks (RWTD and STMD) while analyzing trade-offs on ADE20K semantic segmentation. Key findings show TextureSAM reduces fragmentation and better captures texture-defined boundaries, with moderate augmentation offering a practical balance between texture sensitivity and general segmentation. The work advances texture-aware foundation segmentation and provides a texture-augmented dataset and code, enabling broader testing across domains where texture is the primary cue for boundaries.

Abstract

Segment Anything Models (SAM) have achieved remarkable success in object segmentation tasks across diverse datasets. However, these models are predominantly trained on large-scale semantic segmentation datasets, which introduce a bias toward object shape rather than texture cues in the image. This limitation is critical in domains such as medical imaging, material classification, and remote sensing, where texture changes define object boundaries. In this study, we investigate SAM's bias toward semantics over textures and introduce a new texture-aware foundation model, TextureSAM, which performs superior segmentation in texture-dominant scenarios. To achieve this, we employ a novel fine-tuning approach that incorporates texture augmentation techniques, incrementally modifying training images to emphasize texture features. By leveraging a novel texture-alternation of the ADE20K dataset, we guide TextureSAM to prioritize texture-defined regions, thereby mitigating the inherent shape bias present in the original SAM model. Our extensive experiments demonstrate that TextureSAM significantly outperforms SAM-2 on both natural (+0.2 mIoU) and synthetic (+0.18 mIoU) texture-based segmentation datasets. The code and texture-augmented dataset will be publicly available.

Paper Structure

This paper contains 19 sections, 1 equation, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Samples of Textured-ADE20K dataset. Incremental changes in $\eta$ produce gradual change in texture shift for the resulting image. For low $\eta$ values most of the semantic information in the image is retained. For $\eta=1$ the instances are completely shifted towards the target textures.
  • Figure 2: Illustration of generating textured image for dataset augmentation using Tu:2024:CNT.
  • Figure 3: Box plot comparing predicted segments to the ground truth (GT) for the Synthetic Textured Masks Dataset (STMD). We group the results by the number of GT segments per image. SAM2 fragmentation of textures can be seen in the plot as it generates significantly more masks (segments).
  • Figure 4: Segmentation results on images from the ADE20K dataset (1st row). It can be seen that TextureSAM (3rd row) produces comparable semantic segmentation to the original SAM-2 (2nd row, 3rd row with modified inference parameters). TextureSAM's predictions align better with the GT, where entire textured regions (e.g. trees, walls.) are marked with the same instance.
  • Figure 5: Segmentation results on the synthetic STMD dataset. 1st row shows original images, the following rows present segmentation by the different models and the GT annotations. For this semantic-less dataset, TextureSAM segmentation maps better align with GT annotations, while SAM-2 fragments textured regions into individual elements.