Table of Contents
Fetching ...

VA-GS: Enhancing the Geometric Representation of Gaussian Splatting via View Alignment

Qing Li, Huifang Feng, Xun Gong, Yu-Shen Liu

TL;DR

This work addresses geometry reconstruction in 3D Gaussian Splatting by introducing VA-GS, a view-aligned framework that augments traditional image-based supervision with edge cues, visibility-aware multi-view alignment, normal-based constraints, and cross-view deep feature consistency. The method jointly optimizes five losses—edge-aware image reconstruction, normal alignment, normal smoothing, photometric multi-view alignment, and feature alignment—through a final objective that enforces cross-view geometric fidelity while mitigating illumination artifacts. Experiments across DTU, TNT, and Mip-NeRF 360 demonstrate state-of-the-art performance in both surface reconstruction and novel view synthesis, validating the effectiveness of integrating geometry priors with view-consistent supervision. The approach advances practical 3D reconstruction from Gaussian splats, enabling more accurate meshes and photorealistic renderings in challenging lighting and boundary conditions, with potential implications for 3D modeling and AR/VR applications.

Abstract

3D Gaussian Splatting has recently emerged as an efficient solution for high-quality and real-time novel view synthesis. However, its capability for accurate surface reconstruction remains underexplored. Due to the discrete and unstructured nature of Gaussians, supervision based solely on image rendering loss often leads to inaccurate geometry and inconsistent multi-view alignment. In this work, we propose a novel method that enhances the geometric representation of 3D Gaussians through view alignment (VA). Specifically, we incorporate edge-aware image cues into the rendering loss to improve surface boundary delineation. To enforce geometric consistency across views, we introduce a visibility-aware photometric alignment loss that models occlusions and encourages accurate spatial relationships among Gaussians. To further mitigate ambiguities caused by lighting variations, we incorporate normal-based constraints to refine the spatial orientation of Gaussians and improve local surface estimation. Additionally, we leverage deep image feature embeddings to enforce cross-view consistency, enhancing the robustness of the learned geometry under varying viewpoints and illumination. Extensive experiments on standard benchmarks demonstrate that our method achieves state-of-the-art performance in both surface reconstruction and novel view synthesis. The source code is available at https://github.com/LeoQLi/VA-GS.

VA-GS: Enhancing the Geometric Representation of Gaussian Splatting via View Alignment

TL;DR

This work addresses geometry reconstruction in 3D Gaussian Splatting by introducing VA-GS, a view-aligned framework that augments traditional image-based supervision with edge cues, visibility-aware multi-view alignment, normal-based constraints, and cross-view deep feature consistency. The method jointly optimizes five losses—edge-aware image reconstruction, normal alignment, normal smoothing, photometric multi-view alignment, and feature alignment—through a final objective that enforces cross-view geometric fidelity while mitigating illumination artifacts. Experiments across DTU, TNT, and Mip-NeRF 360 demonstrate state-of-the-art performance in both surface reconstruction and novel view synthesis, validating the effectiveness of integrating geometry priors with view-consistent supervision. The approach advances practical 3D reconstruction from Gaussian splats, enabling more accurate meshes and photorealistic renderings in challenging lighting and boundary conditions, with potential implications for 3D modeling and AR/VR applications.

Abstract

3D Gaussian Splatting has recently emerged as an efficient solution for high-quality and real-time novel view synthesis. However, its capability for accurate surface reconstruction remains underexplored. Due to the discrete and unstructured nature of Gaussians, supervision based solely on image rendering loss often leads to inaccurate geometry and inconsistent multi-view alignment. In this work, we propose a novel method that enhances the geometric representation of 3D Gaussians through view alignment (VA). Specifically, we incorporate edge-aware image cues into the rendering loss to improve surface boundary delineation. To enforce geometric consistency across views, we introduce a visibility-aware photometric alignment loss that models occlusions and encourages accurate spatial relationships among Gaussians. To further mitigate ambiguities caused by lighting variations, we incorporate normal-based constraints to refine the spatial orientation of Gaussians and improve local surface estimation. Additionally, we leverage deep image feature embeddings to enforce cross-view consistency, enhancing the robustness of the learned geometry under varying viewpoints and illumination. Extensive experiments on standard benchmarks demonstrate that our method achieves state-of-the-art performance in both surface reconstruction and novel view synthesis. The source code is available at https://github.com/LeoQLi/VA-GS.

Paper Structure

This paper contains 18 sections, 12 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Our method addresses illumination and boundary artifacts that previous methods fail to resolve.
  • Figure 2: Overview of our method. The training includes five loss functions: $\mathcal{L}_{I}$, $\mathcal{L}_{nc}$, $\mathcal{L}_{ns}$, $\mathcal{L}_{p}$ and $\mathcal{L}_{f}$. The occlusion weight $\omega$, visibility item $\upsilon$ and homography matrix $\bm{H}$ are involved in $\mathcal{L}_{p}$ and $\mathcal{L}_{f}$. The image features $\bm{F}_s$ and $\bm{F}_r$ are extracted using a pretrained network $f$. $\{\bm{K}, \bm{M}\}$ is the intrinsic/extrinsic parameter matrix of the camera view.
  • Figure 3: Visual comparison of surface reconstruction results on the TNT dataset. Our method can handle shadows and large indoor flat regions. GS-Pull reconstructs only the foreground objects.
  • Figure 4: Visual comparison of surface reconstruction results on the Mip-NeRF 360 dataset. Our approach effectively handles the challenges posed by cluttered lighting and boundaries.
  • Figure 5: Visual comparison of surface reconstruction results on the Deep Blending dataset. Our method effectively handles the challenges posed by complex lighting conditions and ambiguous boundaries. GS-Pull is omitted as it fails to produce reasonable reconstructions.
  • ...and 3 more figures