Table of Contents
Fetching ...

Metamon-GS: Enhancing Representability with Variance-Guided Densification and Light Encoding

Junyan Su, Baozhu Zhao, Xiaohan Zhang, Qi Liu

TL;DR

Metamon-GS tackles two bottlenecks in anchor-based 3D Gaussian Splatting: insufficient densification in complex regions and unreliable view-dependent color under varied lighting. It introduces a variance-guided densification (VGD) mechanism that allocates additional Gaussians based on color-gradient variance across views, and a Lighting Hash Encoder (LHE) that encodes lighting and directional information via a hash grid, replacing direct view-direction inputs. Together, these components are integrated with anchor embeddings to improve color fidelity and reconstruction, as demonstrated by significant PSNR/SSIM gains and robust qualitative results on multiple datasets, including Mip-NeRF 360. The work demonstrates that variance-aware densification and hash-based lighting representations can substantially enhance novel view synthesis performance for Gaussian-based scene representations, with practical implications for faster, higher-fidelity neural rendering. Future work includes geometry-aware robustness to extreme viewpoints and self-adaptive densification criteria to further automate and stabilize training across diverse scenes.

Abstract

The introduction of 3D Gaussian Splatting (3DGS) has advanced novel view synthesis by utilizing Gaussians to represent scenes. Encoding Gaussian point features with anchor embeddings has significantly enhanced the performance of newer 3DGS variants. While significant advances have been made, it is still challenging to boost rendering performance. Feature embeddings have difficulty accurately representing colors from different perspectives under varying lighting conditions, which leads to a washed-out appearance. Another reason is the lack of a proper densification strategy that prevents Gaussian point growth in thinly initialized areas, resulting in blurriness and needle-shaped artifacts. To address them, we propose Metamon-GS, from innovative viewpoints of variance-guided densification strategy and multi-level hash grid. The densification strategy guided by variance specifically targets Gaussians with high gradient variance in pixels and compensates for the importance of regions with extra Gaussians to improve reconstruction. The latter studies implicit global lighting conditions and accurately interprets color from different perspectives and feature embeddings. Our thorough experiments on publicly available datasets show that Metamon-GS surpasses its baseline model and previous versions, delivering superior quality in rendering novel views.

Metamon-GS: Enhancing Representability with Variance-Guided Densification and Light Encoding

TL;DR

Metamon-GS tackles two bottlenecks in anchor-based 3D Gaussian Splatting: insufficient densification in complex regions and unreliable view-dependent color under varied lighting. It introduces a variance-guided densification (VGD) mechanism that allocates additional Gaussians based on color-gradient variance across views, and a Lighting Hash Encoder (LHE) that encodes lighting and directional information via a hash grid, replacing direct view-direction inputs. Together, these components are integrated with anchor embeddings to improve color fidelity and reconstruction, as demonstrated by significant PSNR/SSIM gains and robust qualitative results on multiple datasets, including Mip-NeRF 360. The work demonstrates that variance-aware densification and hash-based lighting representations can substantially enhance novel view synthesis performance for Gaussian-based scene representations, with practical implications for faster, higher-fidelity neural rendering. Future work includes geometry-aware robustness to extreme viewpoints and self-adaptive densification criteria to further automate and stabilize training across diverse scenes.

Abstract

The introduction of 3D Gaussian Splatting (3DGS) has advanced novel view synthesis by utilizing Gaussians to represent scenes. Encoding Gaussian point features with anchor embeddings has significantly enhanced the performance of newer 3DGS variants. While significant advances have been made, it is still challenging to boost rendering performance. Feature embeddings have difficulty accurately representing colors from different perspectives under varying lighting conditions, which leads to a washed-out appearance. Another reason is the lack of a proper densification strategy that prevents Gaussian point growth in thinly initialized areas, resulting in blurriness and needle-shaped artifacts. To address them, we propose Metamon-GS, from innovative viewpoints of variance-guided densification strategy and multi-level hash grid. The densification strategy guided by variance specifically targets Gaussians with high gradient variance in pixels and compensates for the importance of regions with extra Gaussians to improve reconstruction. The latter studies implicit global lighting conditions and accurately interprets color from different perspectives and feature embeddings. Our thorough experiments on publicly available datasets show that Metamon-GS surpasses its baseline model and previous versions, delivering superior quality in rendering novel views.

Paper Structure

This paper contains 15 sections, 6 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: In certain areas, the Colmap-generated SfM point cloud is relatively sparse, as indicated by the red box in (a). Utilizing this point cloud as the starting Gaussians and implementing the original clone and split density control strategy may lead to certain areas lacking enough Gaussians, ultimately causing an inadequate scene reconstruction, as demonstrated in the corresponding region highlighted by the red box in (b). This greatly reduces the overall quality of novel view synthesis, especially in areas with intricate geometry or delicate details.
  • Figure 2: Overview of Metamon-GS. Our method enhances 3DGS with two key innovations: (1) Integration of view-direction aware hash encoding to learn view-dependent features, e.g., lighting condition, and (2) A variance-guided densification strategy based on variance of color gradient during backpropagation. We first interpolate view-dependent feature embedding from a hash grid, concatenate them with anchor embeddings, and then feed them into the color MLP. Other features are decoded similarly to Scaffold-GS. This strategy, combined with our novel densification strategy, results in more accurate color representation and efficient Gaussians for scene representation.
  • Figure 3: Pixel-wise gradients and overall gradients of Gaussians under two different cases of under-reconstruction. (a) In an ideal scenario, gradients of pixels consistently converge, forming a high Gaussian positional gradient, which allows effective densification. (b) In contrast, in a scenario with complex textures, gradients diverge in different directions, resulting in a lower Gaussian positional gradient, hindering proper densification.
  • Figure 4: Pre-experiment on 3D-GS with our proposed VGD strategy. We employ an identical starting point for the reconstruction process. After 600 iterations, the curves for 3D-GS with VGD begin to disperse significantly from those without VGD. For 3D-GS w/o VGD, the curves of high gradient variance (blue and yellow) remain relatively high as the densification strategy progresses. For 3D-GS without VGD, the high gradient variance curves (blue and yellow) remain relatively elevated as the densification strategy progresses. In contrast, for 3D-GS with VGD, the corresponding high gradient variance curves decrease rapidly, while the low gradient variance curves (green and red) show an increase. These results show that our suggested method of densification successfully decreases color discrepancies among pixels in Gaussians.
  • Figure 5: Illustration of Lighting Hash Encoder projected into 2D. (a) We index the hash grid by Distant Gaussians are normalized and projected onto a unit sphere. The vertices of the hash grid cells intersected by this unit sphere represent the hash grid parameters that will be optimized during training. This allows for the effective encoding of view-dependent lighting information.
  • ...and 1 more figures