ISCUTE: Instance Segmentation of Cables Using Text Embedding
Shir Kozlovsky, Omkar Joglekar, Dotan Di Castro
TL;DR
This paper tackles the challenge of identifying and segmenting Deformable Linear Objects (DLOs) like cables, where traditional perceptual cues are weak. It introduces ISCUTE, an adapter that bridges CLIPSeg's text-conditioned segmentation with SAM's powerful prompting, enabling text-prompted, one-shot DLO instance segmentation while keeping both foundation models frozen. A CAD-generated, diverse DLO dataset (~30k images) supports training and evaluation, and the approach achieves a leading $mIoU$ around $92\%$ with strong zero-shot generalization to external DLO datasets. The work demonstrates practical impact by offering a user-friendly, text-driven solution for DLO perception, while identifying upper-bound limitations from the underlying foundation models and outlining future improvements to the classifier component and broader applicability.
Abstract
In the field of robotics and automation, conventional object recognition and instance segmentation methods face a formidable challenge when it comes to perceiving Deformable Linear Objects (DLOs) like wires, cables, and flexible tubes. This challenge arises primarily from the lack of distinct attributes such as shape, color, and texture, which calls for tailored solutions to achieve precise identification. In this work, we propose a foundation model-based DLO instance segmentation technique that is text-promptable and user-friendly. Specifically, our approach combines the text-conditioned semantic segmentation capabilities of CLIPSeg model with the zero-shot generalization capabilities of Segment Anything Model (SAM). We show that our method exceeds SOTA performance on DLO instance segmentation, achieving a mIoU of $91.21\%$. We also introduce a rich and diverse DLO-specific dataset for instance segmentation.
