SketchYourSeg: Mask-Free Subjective Image Segmentation via Freehand Sketches
Subhadeep Koley, Viswanatha Reddy Gajjala, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Ayan Kumar Bhunia, Yi-Zhe Song
TL;DR
SketchYourSeg提出一个 mask-free 的框架,利用单个示例自由手绘素描作为查询,在整个图片库中实现主观分割。它将冻结的 FG-SBIR 骨架与预训练的基础模型(如 CLIP 或 DINOv2)结合,通过一个可微分的草图引导过程生成像素级掩码,同时无需像素级标注。训练目标整合了四项损失,形成 $L_{total} = L_{InfoNCE} + L_{SBIR} + L_{unpaired} + L_{reg}$,实现类别级、细粒度与部位级的多颗粒度分割,并通过草图分割增强实现部位级分割。实验在 Sketchy 与 Sketchy-Extended 上显示出对看见类与看不见类的显著改进,验证了这一新的人机交互分割范式在精度与效率之间的良好权衡。
Abstract
We introduce SketchYourSeg, a novel framework that establishes freehand sketches as a powerful query modality for subjective image segmentation across entire galleries through a single exemplar sketch. Unlike text prompts that struggle with spatial specificity or interactive methods confined to single-image operations, sketches naturally combine semantic intent with structural precision. This unique dual encoding enables precise visual disambiguation for segmentation tasks where text descriptions would be cumbersome or ambiguous -- such as distinguishing between visually similar instances, specifying exact part boundaries, or indicating spatial relationships in composed concepts. Our approach addresses three fundamental challenges: (i) eliminating the need for pixel-perfect annotation masks during training with a mask-free framework; (ii) creating a synergistic relationship between sketch-based image retrieval (SBIR) models and foundation models (CLIP/DINOv2) where the former provides training signals while the latter generates masks; and (iii) enabling multi-granular segmentation capabilities through purpose-made sketch augmentation strategies. Our extensive evaluations demonstrate superior performance over existing approaches across diverse benchmarks, establishing a new paradigm for user-guided image segmentation that balances precision with efficiency.
