Table of Contents
Fetching ...

Pointmap Association and Piecewise-Plane Constraint for Consistent and Compact 3D Gaussian Segmentation Field

Wenhao Hu, Wenhao Chai, Shengyu Hao, Xiaotong Cui, Xuexiang Wen, Jenq-Neng Hwang, Gaoang Wang

TL;DR

CCGS tackles the challenge of maintaining semantic coherence across views in 3D segmentation by jointly ensuring view-consistent 2D segmentation and a compact 3D Gaussian segmentation field. It introduces pointmap-based association to unify multi-view masks and a Hungarian-matching-based mask association that tolerates partial matches, yielding consistent 2D labels and a 3D segmented point cloud. To enforce compactness and structure in the 3D field, CCGS employs piecewise-plane constrained Gaussian splatting with plane regularization and a split-projection strategy that preserves object boundaries and minimizes floaters. Experimental results on ScanNet and Replica demonstrate state-of-the-art performance in both 2D and 3D segmentation, with notable gains in multi-view consistency and downstream robustness for editing tasks. Overall, the approach provides a practical pathway to accurate, view-stable 3D scene representations suitable for perception, manipulation, and editing tasks.

Abstract

Achieving a consistent and compact 3D segmentation field is crucial for maintaining semantic coherence across views and accurately representing scene structures. Previous 3D scene segmentation methods rely on video segmentation models to address inconsistencies across views, but the absence of spatial information often leads to object misassociation when object temporarily disappear and reappear. Furthermore, in the process of 3D scene reconstruction, segmentation and optimization are often treated as separate tasks. As a result, optimization typically lacks awareness of semantic category information, which can result in floaters with ambiguous segmentation. To address these challenges, we introduce CCGS, a method designed to achieve both view consistent 2D segmentation and a compact 3D Gaussian segmentation field. CCGS incorporates pointmap association and a piecewise-plane constraint. First, we establish pixel correspondence between adjacent images by minimizing the Euclidean distance between their pointmaps. We then redefine object mask overlap accordingly. The Hungarian algorithm is employed to optimize mask association by minimizing the total matching cost, while allowing for partial matches. To further enhance compactness, the piecewise-plane constraint restricts point displacement within local planes during optimization, thereby preserving structural integrity. Experimental results on ScanNet and Replica datasets demonstrate that CCGS outperforms existing methods in both 2D panoptic segmentation and 3D Gaussian segmentation.

Pointmap Association and Piecewise-Plane Constraint for Consistent and Compact 3D Gaussian Segmentation Field

TL;DR

CCGS tackles the challenge of maintaining semantic coherence across views in 3D segmentation by jointly ensuring view-consistent 2D segmentation and a compact 3D Gaussian segmentation field. It introduces pointmap-based association to unify multi-view masks and a Hungarian-matching-based mask association that tolerates partial matches, yielding consistent 2D labels and a 3D segmented point cloud. To enforce compactness and structure in the 3D field, CCGS employs piecewise-plane constrained Gaussian splatting with plane regularization and a split-projection strategy that preserves object boundaries and minimizes floaters. Experimental results on ScanNet and Replica demonstrate state-of-the-art performance in both 2D and 3D segmentation, with notable gains in multi-view consistency and downstream robustness for editing tasks. Overall, the approach provides a practical pathway to accurate, view-stable 3D scene representations suitable for perception, manipulation, and editing tasks.

Abstract

Achieving a consistent and compact 3D segmentation field is crucial for maintaining semantic coherence across views and accurately representing scene structures. Previous 3D scene segmentation methods rely on video segmentation models to address inconsistencies across views, but the absence of spatial information often leads to object misassociation when object temporarily disappear and reappear. Furthermore, in the process of 3D scene reconstruction, segmentation and optimization are often treated as separate tasks. As a result, optimization typically lacks awareness of semantic category information, which can result in floaters with ambiguous segmentation. To address these challenges, we introduce CCGS, a method designed to achieve both view consistent 2D segmentation and a compact 3D Gaussian segmentation field. CCGS incorporates pointmap association and a piecewise-plane constraint. First, we establish pixel correspondence between adjacent images by minimizing the Euclidean distance between their pointmaps. We then redefine object mask overlap accordingly. The Hungarian algorithm is employed to optimize mask association by minimizing the total matching cost, while allowing for partial matches. To further enhance compactness, the piecewise-plane constraint restricts point displacement within local planes during optimization, thereby preserving structural integrity. Experimental results on ScanNet and Replica datasets demonstrate that CCGS outperforms existing methods in both 2D panoptic segmentation and 3D Gaussian segmentation.

Paper Structure

This paper contains 29 sections, 15 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Differences in mask association: Video vs. Pointmap. Video segmentation often struggle to maintain consistency during significant changes in camera views. In contrast, constructing a unified 3D point cloud field can ensure segmentation accuracy by leveraging spatial information.
  • Figure 2: The pipeline of our method. (a) We first construct a unified point cloud field and establish correspondences between pixels using the pointmaps. (b) Leveraging these relationships, we construct a cost matrix for instance masks across two frames. The Hungarian algorithm is then applied to optimize the cost matrix, ensuring consistent mask association. By merging all frames, we obtain a point cloud enriched with consistent segmentation information. (c) This point cloud serves as the initialization for 3D Gaussians. To achieve compact 3D segmentation, we employ a piecewise-plane constraint, restricting point displacement within local planes through plane regularization and split projection.
  • Figure 3: 2D segmentation results on Replica and ScanNet datasets. Each column from left to right in the figure represents Ground truth segmentation, Panoptic Lifting, Contrastive Lift, Feature 3DGS, SAGA, Gaussian Grouping and Ours (CCGS). The top four lines represent different scenes in Replica. The following four lines are from different scenes in ScanNet.
  • Figure 4: The comparison on multi-view consistency between CCGS and GG (Gaussian Grouping) and PL (Panoptic Lifting). From top to bottom, the images display CCGS, GG, PL and RGB inputs, respectively.
  • Figure 5: 3D Gaussian segmentation results on ScanNet and Replica datasets. Each scene consists of a ground truth mesh, ground truth point cloud segmentation, our method (CCGS), Gaussian Grouping, as well as coarse-level and fine-level OpenGaussian results.
  • ...and 2 more figures