FHGS: Feature-Homogenized Gaussian Splatting
Q. G. Duan, Benyun Zhao, Mingqiao Han Yijun Huang, Ben M. Chen
TL;DR
FHGS addresses the mismatch between anisotropic Gaussian splats and isotropic semantic features by embedding high-dimensional semantic priors from models like SAM and CLIP into a sparse 3D Gaussian Splatting framework. It introduces a General Feature Fusion Architecture, a Non-Differentiable Feature Driving branch, and a Physics-Inspired Dual-Drive Mechanism that couples an external feature loss $L_{gt}$ with an internal clustering loss $L_{cf}$ (and a feature loss $L_{feat}$) while maintaining per-primitive attributes and real-time rendering with weights $w_i$. The method achieves improved multi-view semantic coherence and stronger geometric reconstruction, while reducing training time and preserving efficiency on indoor and outdoor benchmarks. This work demonstrates that non-differentiable feature fusion coupled with physics-inspired regularization can enhance semantic-augmented 3D scene representations for downstream tasks.
Abstract
Scene understanding based on 3D Gaussian Splatting (3DGS) has recently achieved notable advances. Although 3DGS related methods have efficient rendering capabilities, they fail to address the inherent contradiction between the anisotropic color representation of gaussian primitives and the isotropic requirements of semantic features, leading to insufficient cross-view feature consistency. To overcome the limitation, we proposes $\textit{FHGS}$ (Feature-Homogenized Gaussian Splatting), a novel 3D feature fusion framework inspired by physical models, which can achieve high-precision mapping of arbitrary 2D features from pre-trained models to 3D scenes while preserving the real-time rendering efficiency of 3DGS. Specifically, our $\textit{FHGS}$ introduces the following innovations: Firstly, a universal feature fusion architecture is proposed, enabling robust embedding of large-scale pre-trained models' semantic features (e.g., SAM, CLIP) into sparse 3D structures. Secondly, a non-differentiable feature fusion mechanism is introduced, which enables semantic features to exhibit viewpoint independent isotropic distributions. This fundamentally balances the anisotropic rendering of gaussian primitives and the isotropic expression of features; Thirdly, a dual-driven optimization strategy inspired by electric potential fields is proposed, which combines external supervision from semantic feature fields with internal primitive clustering guidance. This mechanism enables synergistic optimization of global semantic alignment and local structural consistency. More interactive results can be accessed on: https://fhgs.cuastro.org/.
