Table of Contents
Fetching ...

GradiSeg: Gradient-Guided Gaussian Segmentation with Enhanced 3D Boundary Precision

Zehao Li, Wenwei Han, Yujun Cai, Hao Jiang, Baolong Bi, Shuqin Gao, Honglong Zhao, Zhaoqi Wang

TL;DR

GradiSeg addresses the persistent challenge of blurry object boundaries in 3D semantic segmentation by introducing Identity Encoding with two boundary-aware modules, IGD and LA-KNN, within a 3D Gaussian Splatting framework. IGD adaptively densifies and splits Gaussians near boundaries based on Identity Encoding gradients, while LA-KNN enforces direction-aware local consistency of Identity Encodings to prevent spurious propagation. The method achieves state-of-the-art improvements on open-vocabulary and multi-view segmentation on LERF-Mask, with average mIoU gains of around $5.27\%$ and boundary gains of about $6.3\%$, without sacrificing reconstruction quality on Mip-NeRF 360. These contributions enable more precise 3D scene understanding and support downstream editing tasks such as object removal and swapping in real time.

Abstract

While 3D Gaussian Splatting enables high-quality real-time rendering, existing Gaussian-based frameworks for 3D semantic segmentation still face significant challenges in boundary recognition accuracy. To address this, we propose a novel 3DGS-based framework named GradiSeg, incorporating Identity Encoding to construct a deeper semantic understanding of scenes. Our approach introduces two key modules: Identity Gradient Guided Densification (IGD) and Local Adaptive K-Nearest Neighbors (LA-KNN). The IGD module supervises gradients of Identity Encoding to refine Gaussian distributions along object boundaries, aligning them closely with boundary contours. Meanwhile, the LA-KNN module employs position gradients to adaptively establish locality-aware propagation of Identity Encodings, preventing irregular Gaussian spreads near boundaries. We validate the effectiveness of our method through comprehensive experiments. Results show that GradiSeg effectively addresses boundary-related issues, significantly improving segmentation accuracy without compromising scene reconstruction quality. Furthermore, our method's robust segmentation capability and decoupled Identity Encoding representation make it highly suitable for various downstream scene editing tasks, including 3D object removal, swapping and so on.

GradiSeg: Gradient-Guided Gaussian Segmentation with Enhanced 3D Boundary Precision

TL;DR

GradiSeg addresses the persistent challenge of blurry object boundaries in 3D semantic segmentation by introducing Identity Encoding with two boundary-aware modules, IGD and LA-KNN, within a 3D Gaussian Splatting framework. IGD adaptively densifies and splits Gaussians near boundaries based on Identity Encoding gradients, while LA-KNN enforces direction-aware local consistency of Identity Encodings to prevent spurious propagation. The method achieves state-of-the-art improvements on open-vocabulary and multi-view segmentation on LERF-Mask, with average mIoU gains of around and boundary gains of about , without sacrificing reconstruction quality on Mip-NeRF 360. These contributions enable more precise 3D scene understanding and support downstream editing tasks such as object removal and swapping in real time.

Abstract

While 3D Gaussian Splatting enables high-quality real-time rendering, existing Gaussian-based frameworks for 3D semantic segmentation still face significant challenges in boundary recognition accuracy. To address this, we propose a novel 3DGS-based framework named GradiSeg, incorporating Identity Encoding to construct a deeper semantic understanding of scenes. Our approach introduces two key modules: Identity Gradient Guided Densification (IGD) and Local Adaptive K-Nearest Neighbors (LA-KNN). The IGD module supervises gradients of Identity Encoding to refine Gaussian distributions along object boundaries, aligning them closely with boundary contours. Meanwhile, the LA-KNN module employs position gradients to adaptively establish locality-aware propagation of Identity Encodings, preventing irregular Gaussian spreads near boundaries. We validate the effectiveness of our method through comprehensive experiments. Results show that GradiSeg effectively addresses boundary-related issues, significantly improving segmentation accuracy without compromising scene reconstruction quality. Furthermore, our method's robust segmentation capability and decoupled Identity Encoding representation make it highly suitable for various downstream scene editing tasks, including 3D object removal, swapping and so on.

Paper Structure

This paper contains 32 sections, 5 equations, 10 figures, 5 tables, 1 algorithm.

Figures (10)

  • Figure 1: We adopt Identity Encoding to construct 3D semantic segmentation. In the original 3D scene (column a), we selectively render Gaussians that exhibit unusually high Identity Encoding gradients, generating a selective rendering (column c). It is observed that these Gaussians predominantly cluster around object boundaries. To facilitate comparison, we present a locally enlarged view of the original 3D scene (column b) and visualize the Identity Encoding features of these selected Gaussians (column d).
  • Figure 2: Overview of the proposed method. a) We adopt Identity Encoding as a learnable vector to construct a semantic understanding of the scene. This vector is optimized through multi-view supervision to produce initial segmentation results. b) To tackle boundary ambiguity, we introduce two boundary enhancement modules: IGD and LA-KNN. IGD refines Gaussians near object boundaries by monitoring Identity Encoding gradients. Complementarily, LA-KNN enables direction-aware feature propagation by leveraging position gradients for neighbor selection, preventing cross-instance feature contamination at boundaries.
  • Figure 3: The process of the IGD module. The first row refers to Identity Encoding gradient monitoring. For Gaussians near the boundaries, in order to optimize, they continuously adjust their Identity Encoding, leading to an increasingly high gradient that may become anomalous. The second row involves Identity Encoding densification. For Gaussians with anomalous gradients, we perform splitting and adjust them to both sides of the boundary, addressing optimization conflicts during the training process.
  • Figure 4: The process of the LA-KNN module. We first compute the neighboring direction by taking the opposite direction of the Gaussian position gradient. Then, we eliminate all Gaussians whose angle with the direction vector is greater than 180 degrees. For the remaining Gaussians, we sort them by their projection distance to the direction vector and select the $K$ nearest neighbors, where $K=2$. Finally, we align the Identity Encoding features in the local space.
  • Figure 5: The visualization comparison results on the LERF-Mask dataset are as follows: For each scene, the first column shows the 3D reconstruction rendering results. For different text prompts, we use Grounding DINO to select the corresponding object IDs for rendering. The second column displays the results of Gaussian Grouping, and the third column shows our results. Additionally, we manually select the corresponding object IDs to demonstrate that our rendering results are sufficiently accurate.
  • ...and 5 more figures