ISCUTE: Instance Segmentation of Cables Using Text Embedding

Shir Kozlovsky; Omkar Joglekar; Dotan Di Castro

ISCUTE: Instance Segmentation of Cables Using Text Embedding

Shir Kozlovsky, Omkar Joglekar, Dotan Di Castro

TL;DR

This paper tackles the challenge of identifying and segmenting Deformable Linear Objects (DLOs) like cables, where traditional perceptual cues are weak. It introduces ISCUTE, an adapter that bridges CLIPSeg's text-conditioned segmentation with SAM's powerful prompting, enabling text-prompted, one-shot DLO instance segmentation while keeping both foundation models frozen. A CAD-generated, diverse DLO dataset (~30k images) supports training and evaluation, and the approach achieves a leading $mIoU$ around $92\%$ with strong zero-shot generalization to external DLO datasets. The work demonstrates practical impact by offering a user-friendly, text-driven solution for DLO perception, while identifying upper-bound limitations from the underlying foundation models and outlining future improvements to the classifier component and broader applicability.

Abstract

In the field of robotics and automation, conventional object recognition and instance segmentation methods face a formidable challenge when it comes to perceiving Deformable Linear Objects (DLOs) like wires, cables, and flexible tubes. This challenge arises primarily from the lack of distinct attributes such as shape, color, and texture, which calls for tailored solutions to achieve precise identification. In this work, we propose a foundation model-based DLO instance segmentation technique that is text-promptable and user-friendly. Specifically, our approach combines the text-conditioned semantic segmentation capabilities of CLIPSeg model with the zero-shot generalization capabilities of Segment Anything Model (SAM). We show that our method exceeds SOTA performance on DLO instance segmentation, achieving a mIoU of $91.21\%$. We also introduce a rich and diverse DLO-specific dataset for instance segmentation.

ISCUTE: Instance Segmentation of Cables Using Text Embedding

TL;DR

around

with strong zero-shot generalization to external DLO datasets. The work demonstrates practical impact by offering a user-friendly, text-driven solution for DLO perception, while identifying upper-bound limitations from the underlying foundation models and outlining future improvements to the classifier component and broader applicability.

Abstract

. We also introduce a rich and diverse DLO-specific dataset for instance segmentation.

Paper Structure (22 sections, 14 figures, 5 tables)

This paper contains 22 sections, 14 figures, 5 tables.

Introduction
Related Work
DLOs Instance Segmentation
Foundation Models
Methods
The ISCUTE Model
Prompt encoder network
Mask classification network
Training protocol
Experiments
The cables dataset
Baseline experiments
Quantitative experiments
Results
Quantitative results
...and 7 more sections

Figures (14)

Figure 1: Overview of the full pipeline - blocks in red represent our additions
Figure 2: Issues with using SAM out-of-the-box
Figure 3: The ISCUTE adapter: on the left, the architecture of the prompt encoder network is outlined (indicated by the purple dashed line), while on the right, the classifier network architecture is detailed (represented by the pink dashed line).
Figure 4: Qualitative comparison in specific scenarios. Each scenario demonstrates the following: (a) and (b) real images, (c) identical colors, (d) a high density of cables in a single image, and (e) and (f) small DLOs at the edge of the image with varying thicknesses.
Figure 5: A qualitative comparison of our model vs. the SOTA baselines
...and 9 more figures

ISCUTE: Instance Segmentation of Cables Using Text Embedding

TL;DR

Abstract

ISCUTE: Instance Segmentation of Cables Using Text Embedding

Authors

TL;DR

Abstract

Table of Contents

Figures (14)