Table of Contents
Fetching ...

Localization and Expansion: A Decoupled Framework for Point Cloud Few-shot Semantic Segmentation

Zhaoyang Li, Yuan Wang, Wangkai Li, Rui Sun, Tianzhu Zhang

TL;DR

This paper tackles point-cloud few-shot semantic segmentation (PC-FSS) by identifying the limitations of direct point-level prototype matching, especially background confusion and intra-class diversity. It introduces Decoupled Localization and Expansion (DLE), comprising a Structural Localization Module that uses semantically aware agents for distribution-level matching and a Self-Expansion Module that expands localized regions using intra-object query cues with a conservative consistency check. The approach yields substantial performance gains on S3DIS and ScanNet across 1-shot and few-shot settings, demonstrating improved target localization and more complete foreground excavation. By combining structure-aware matching with query-driven expansion, DLE provides robust, data-efficient segmentation in complex 3D scenes.

Abstract

Point cloud few-shot semantic segmentation (PC-FSS) aims to segment targets of novel categories in a given query point cloud with only a few annotated support samples. The current top-performing prototypical learning methods employ prototypes originating from support samples to direct the classification of query points. However, the inherent fragility of point-level matching and the prevalent intra-class diversity pose great challenges to this cross-instance matching paradigm, leading to erroneous background activations or incomplete target excavation. In this work, we propose a simple yet effective framework in the spirit of Decoupled Localization and Expansion (DLE). The proposed DLE, including a structural localization module (SLM) and a self-expansion module (SEM), enjoys several merits. First, structural information is injected into the matching process through the agent-level correlation in SLM, and the confident target region can thus be precisely located. Second, more reliable intra-object similarity is harnessed in SEM to derive the complete target, and the conservative expansion strategy is introduced to reasonably constrain the expansion. Extensive experiments on two challenging benchmarks under different settings demonstrate that DLE outperforms previous state-of-the-art approaches by large margins.

Localization and Expansion: A Decoupled Framework for Point Cloud Few-shot Semantic Segmentation

TL;DR

This paper tackles point-cloud few-shot semantic segmentation (PC-FSS) by identifying the limitations of direct point-level prototype matching, especially background confusion and intra-class diversity. It introduces Decoupled Localization and Expansion (DLE), comprising a Structural Localization Module that uses semantically aware agents for distribution-level matching and a Self-Expansion Module that expands localized regions using intra-object query cues with a conservative consistency check. The approach yields substantial performance gains on S3DIS and ScanNet across 1-shot and few-shot settings, demonstrating improved target localization and more complete foreground excavation. By combining structure-aware matching with query-driven expansion, DLE provides robust, data-efficient segmentation in complex 3D scenes.

Abstract

Point cloud few-shot semantic segmentation (PC-FSS) aims to segment targets of novel categories in a given query point cloud with only a few annotated support samples. The current top-performing prototypical learning methods employ prototypes originating from support samples to direct the classification of query points. However, the inherent fragility of point-level matching and the prevalent intra-class diversity pose great challenges to this cross-instance matching paradigm, leading to erroneous background activations or incomplete target excavation. In this work, we propose a simple yet effective framework in the spirit of Decoupled Localization and Expansion (DLE). The proposed DLE, including a structural localization module (SLM) and a self-expansion module (SEM), enjoys several merits. First, structural information is injected into the matching process through the agent-level correlation in SLM, and the confident target region can thus be precisely located. Second, more reliable intra-object similarity is harnessed in SEM to derive the complete target, and the conservative expansion strategy is introduced to reasonably constrain the expansion. Extensive experiments on two challenging benchmarks under different settings demonstrate that DLE outperforms previous state-of-the-art approaches by large margins.
Paper Structure (16 sections, 12 equations, 7 figures, 5 tables)

This paper contains 16 sections, 12 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Motivation of our method. (a), (b) Different types of segmentation deficiencies resulted from the inherent fragility of point-level matching and the prevalent intra-class diversity. (c) We employ distribution-level matching that incorporates structural information to replace point-level matching. (d) We leverage intra-object similarity to fully excavate targets, thereby circumventing the impact of intra-class diversity.
  • Figure 2: T-SNE visualization highlights intra-class diversity and intra-object similarity.
  • Figure 3: Illustration of the proposed DLE. There are two main modules in DLE. The structural localization module is responsible for precisely locating confident target regions by introducing a set of semantically structure-aware agents. The self-expansion module expands the located target region to mine extensive query information further. These two modules collaboratively constitute a decoupled localization and expansion (DLE) framework for Point Cloud Few-Shot Segmentation.
  • Figure 4: Qualitative results of our method in 1-way 1-shot point cloud few-shot segmentation on ScanNet dataset in comparison to the ground truth, AttMPTI and QGE. The target classes from top to bottom are "bathtub" (first row), "bed" (second row), "table" (third row) and "toilet" (last row).
  • Figure 5: T-SNE visualization of the feature distribution of the distribution of support foreground prototype, query foreground and query background in the original feature space (a) and in the agent-dimension-modified space (b).
  • ...and 2 more figures