GradiSeg: Gradient-Guided Gaussian Segmentation with Enhanced 3D Boundary Precision
Zehao Li, Wenwei Han, Yujun Cai, Hao Jiang, Baolong Bi, Shuqin Gao, Honglong Zhao, Zhaoqi Wang
TL;DR
GradiSeg addresses the persistent challenge of blurry object boundaries in 3D semantic segmentation by introducing Identity Encoding with two boundary-aware modules, IGD and LA-KNN, within a 3D Gaussian Splatting framework. IGD adaptively densifies and splits Gaussians near boundaries based on Identity Encoding gradients, while LA-KNN enforces direction-aware local consistency of Identity Encodings to prevent spurious propagation. The method achieves state-of-the-art improvements on open-vocabulary and multi-view segmentation on LERF-Mask, with average mIoU gains of around $5.27\%$ and boundary gains of about $6.3\%$, without sacrificing reconstruction quality on Mip-NeRF 360. These contributions enable more precise 3D scene understanding and support downstream editing tasks such as object removal and swapping in real time.
Abstract
While 3D Gaussian Splatting enables high-quality real-time rendering, existing Gaussian-based frameworks for 3D semantic segmentation still face significant challenges in boundary recognition accuracy. To address this, we propose a novel 3DGS-based framework named GradiSeg, incorporating Identity Encoding to construct a deeper semantic understanding of scenes. Our approach introduces two key modules: Identity Gradient Guided Densification (IGD) and Local Adaptive K-Nearest Neighbors (LA-KNN). The IGD module supervises gradients of Identity Encoding to refine Gaussian distributions along object boundaries, aligning them closely with boundary contours. Meanwhile, the LA-KNN module employs position gradients to adaptively establish locality-aware propagation of Identity Encodings, preventing irregular Gaussian spreads near boundaries. We validate the effectiveness of our method through comprehensive experiments. Results show that GradiSeg effectively addresses boundary-related issues, significantly improving segmentation accuracy without compromising scene reconstruction quality. Furthermore, our method's robust segmentation capability and decoupled Identity Encoding representation make it highly suitable for various downstream scene editing tasks, including 3D object removal, swapping and so on.
