SketchYourSeg: Mask-Free Subjective Image Segmentation via Freehand Sketches

Subhadeep Koley; Viswanatha Reddy Gajjala; Aneeshan Sain; Pinaki Nath Chowdhury; Tao Xiang; Ayan Kumar Bhunia; Yi-Zhe Song

SketchYourSeg: Mask-Free Subjective Image Segmentation via Freehand Sketches

Subhadeep Koley, Viswanatha Reddy Gajjala, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Ayan Kumar Bhunia, Yi-Zhe Song

TL;DR

SketchYourSeg提出一个 mask-free 的框架，利用单个示例自由手绘素描作为查询，在整个图片库中实现主观分割。它将冻结的 FG-SBIR 骨架与预训练的基础模型（如 CLIP 或 DINOv2）结合，通过一个可微分的草图引导过程生成像素级掩码，同时无需像素级标注。训练目标整合了四项损失，形成 $L_{total} = L_{InfoNCE} + L_{SBIR} + L_{unpaired} + L_{reg}$，实现类别级、细粒度与部位级的多颗粒度分割，并通过草图分割增强实现部位级分割。实验在 Sketchy 与 Sketchy-Extended 上显示出对看见类与看不见类的显著改进，验证了这一新的人机交互分割范式在精度与效率之间的良好权衡。

Abstract

We introduce SketchYourSeg, a novel framework that establishes freehand sketches as a powerful query modality for subjective image segmentation across entire galleries through a single exemplar sketch. Unlike text prompts that struggle with spatial specificity or interactive methods confined to single-image operations, sketches naturally combine semantic intent with structural precision. This unique dual encoding enables precise visual disambiguation for segmentation tasks where text descriptions would be cumbersome or ambiguous -- such as distinguishing between visually similar instances, specifying exact part boundaries, or indicating spatial relationships in composed concepts. Our approach addresses three fundamental challenges: (i) eliminating the need for pixel-perfect annotation masks during training with a mask-free framework; (ii) creating a synergistic relationship between sketch-based image retrieval (SBIR) models and foundation models (CLIP/DINOv2) where the former provides training signals while the latter generates masks; and (iii) enabling multi-granular segmentation capabilities through purpose-made sketch augmentation strategies. Our extensive evaluations demonstrate superior performance over existing approaches across diverse benchmarks, establishing a new paradigm for user-guided image segmentation that balances precision with efficiency.

SketchYourSeg: Mask-Free Subjective Image Segmentation via Freehand Sketches

TL;DR

Abstract

SketchYourSeg: Mask-Free Subjective Image Segmentation via Freehand Sketches

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)