Table of Contents
Fetching ...

iSeg: Interactive 3D Segmentation via Interactive Attention

Itai Lang, Fei Xu, Dale Decatur, Sudarshan Babu, Rana Hanocka

TL;DR

This work designs a segmentation method conditioned on fine user clicks, which operates entirely in 3D, and proposes a novel interactive attention module capable of processing different numbers and types of clicks, enabling the training of a single unified interactive segmentation model.

Abstract

We present iSeg, a new interactive technique for segmenting 3D shapes. Previous works have focused mainly on leveraging pre-trained 2D foundation models for 3D segmentation based on text. However, text may be insufficient for accurately describing fine-grained spatial segmentations. Moreover, achieving a consistent 3D segmentation using a 2D model is highly challenging, since occluded areas of the same semantic region may not be visible together from any 2D view. Thus, we design a segmentation method conditioned on fine user clicks, which operates entirely in 3D. Our system accepts user clicks directly on the shape's surface, indicating the inclusion or exclusion of regions from the desired shape partition. To accommodate various click settings, we propose a novel interactive attention module capable of processing different numbers and types of clicks, enabling the training of a single unified interactive segmentation model. We apply iSeg to a myriad of shapes from different domains, demonstrating its versatility and faithfulness to the user's specifications. Our project page is at https://threedle.github.io/iSeg/.

iSeg: Interactive 3D Segmentation via Interactive Attention

TL;DR

This work designs a segmentation method conditioned on fine user clicks, which operates entirely in 3D, and proposes a novel interactive attention module capable of processing different numbers and types of clicks, enabling the training of a single unified interactive segmentation model.

Abstract

We present iSeg, a new interactive technique for segmenting 3D shapes. Previous works have focused mainly on leveraging pre-trained 2D foundation models for 3D segmentation based on text. However, text may be insufficient for accurately describing fine-grained spatial segmentations. Moreover, achieving a consistent 3D segmentation using a 2D model is highly challenging, since occluded areas of the same semantic region may not be visible together from any 2D view. Thus, we design a segmentation method conditioned on fine user clicks, which operates entirely in 3D. Our system accepts user clicks directly on the shape's surface, indicating the inclusion or exclusion of regions from the desired shape partition. To accommodate various click settings, we propose a novel interactive attention module capable of processing different numbers and types of clicks, enabling the training of a single unified interactive segmentation model. We apply iSeg to a myriad of shapes from different domains, demonstrating its versatility and faithfulness to the user's specifications. Our project page is at https://threedle.github.io/iSeg/.
Paper Structure (43 sections, 12 equations, 24 figures, 3 tables)

This paper contains 43 sections, 12 equations, 24 figures, 3 tables.

Figures (24)

  • Figure 1: Fine-grained segmentation from a single positive click. iSeg is capable of generating granular segmentations (visualized in blue) given a single click as input (depicted with a green dot). Our method is highly flexible and can select parts that vary in size, geometry, and semantic meaning.
  • Figure 2: Training of the iSeg decoder. Our decoder takes the Mesh Feature Field (MFF) computed by the iSeg encoder, along with the user input clicks, and generates a 3D segmentation map visualized in blue. We leverage a pre-trained 2D segmentation model kirillov2023segment to supervise our training with 2D segmentation masks using rendered images of the shape and the 2D projection of the 3D clicks. Although iSeg is trained using noisy and inconsistent 2D segmentations, it is view-consistent by construction.
  • Figure 3: Interactive Attention. Our interactive attention layer can handle a variable number of user clicks. The clicks may be positive or negative to indicate region inclusion or exclusion, respectively.
  • Figure 4: Native 3D segmentation. iSeg segments parts in a 3D-consistent manner, regardless of whether the surface is occluded from the point click. A point is selected on the back of the chair (left), which is not visible from the front view. Still, our method delineates the occluded surface even though the 2D training data cannot contain this information. Furthermore, we may input two point clicks occluded from each other, one on the back of the chair and one on the front (right). These points cannot be simultaneously input to any 2D decoder, as they are not visible concurrently from any single viewpoint. Nonetheless, iSeg faithfully segments the whole backrest part.
  • Figure 5: Couple of clicks results. iSeg produces fine-grained segmentations from a couple of clicks (both positive and negative) as input. Each pair of shapes starts with a single positive click (left), which can be further customized using an additional click (right).
  • ...and 19 more figures