Table of Contents
Fetching ...

GaussianFocus: Constrained Attention Focus for 3D Gaussian Splatting

Zexu Huang, Min Xu, Stuart Perry

TL;DR

GaussianFocus addresses the redundancy and scalability issues of 3D Gaussian Splatting by introducing a patch-based attention mechanism, Gaussian Sphere Constraints to curb oversized Gaussians, and a subdivision-based reconstruction strategy for large scenes. The method couples a patch attention module with edge and frequency losses, and enforces density-driven constraints on Gaussian sizes, enabling high-fidelity reconstructions and efficient handling of urban-scale environments. Key contributions include 3D Gaussian-Based Patch Attention, Gaussian Sphere Constraints, and Subdivision-Based Reconstruction, which together yield superior rendering quality and scalability compared to state-of-the-art baselines on multiple datasets. The approach significantly reduces artifacts such as air walls, improves detail capture at edges, and enables direct training on large-scale scenes with reduced memory and compute requirements, making it practical for city-scale reconstructions.

Abstract

Recent developments in 3D reconstruction and neural rendering have significantly propelled the capabilities of photo-realistic 3D scene rendering across various academic and industrial fields. The 3D Gaussian Splatting technique, alongside its derivatives, integrates the advantages of primitive-based and volumetric representations to deliver top-tier rendering quality and efficiency. Despite these advancements, the method tends to generate excessive redundant noisy Gaussians overfitted to every training view, which degrades the rendering quality. Additionally, while 3D Gaussian Splatting excels in small-scale and object-centric scenes, its application to larger scenes is hindered by constraints such as limited video memory, excessive optimization duration, and variable appearance across views. To address these challenges, we introduce GaussianFocus, an innovative approach that incorporates a patch attention algorithm to refine rendering quality and implements a Gaussian constraints strategy to minimize redundancy. Moreover, we propose a subdivision reconstruction strategy for large-scale scenes, dividing them into smaller, manageable blocks for individual training. Our results indicate that GaussianFocus significantly reduces unnecessary Gaussians and enhances rendering quality, surpassing existing State-of-The-Art (SoTA) methods. Furthermore, we demonstrate the capability of our approach to effectively manage and render large scenes, such as urban environments, whilst maintaining high fidelity in the visual output.

GaussianFocus: Constrained Attention Focus for 3D Gaussian Splatting

TL;DR

GaussianFocus addresses the redundancy and scalability issues of 3D Gaussian Splatting by introducing a patch-based attention mechanism, Gaussian Sphere Constraints to curb oversized Gaussians, and a subdivision-based reconstruction strategy for large scenes. The method couples a patch attention module with edge and frequency losses, and enforces density-driven constraints on Gaussian sizes, enabling high-fidelity reconstructions and efficient handling of urban-scale environments. Key contributions include 3D Gaussian-Based Patch Attention, Gaussian Sphere Constraints, and Subdivision-Based Reconstruction, which together yield superior rendering quality and scalability compared to state-of-the-art baselines on multiple datasets. The approach significantly reduces artifacts such as air walls, improves detail capture at edges, and enables direct training on large-scale scenes with reduced memory and compute requirements, making it practical for city-scale reconstructions.

Abstract

Recent developments in 3D reconstruction and neural rendering have significantly propelled the capabilities of photo-realistic 3D scene rendering across various academic and industrial fields. The 3D Gaussian Splatting technique, alongside its derivatives, integrates the advantages of primitive-based and volumetric representations to deliver top-tier rendering quality and efficiency. Despite these advancements, the method tends to generate excessive redundant noisy Gaussians overfitted to every training view, which degrades the rendering quality. Additionally, while 3D Gaussian Splatting excels in small-scale and object-centric scenes, its application to larger scenes is hindered by constraints such as limited video memory, excessive optimization duration, and variable appearance across views. To address these challenges, we introduce GaussianFocus, an innovative approach that incorporates a patch attention algorithm to refine rendering quality and implements a Gaussian constraints strategy to minimize redundancy. Moreover, we propose a subdivision reconstruction strategy for large-scale scenes, dividing them into smaller, manageable blocks for individual training. Our results indicate that GaussianFocus significantly reduces unnecessary Gaussians and enhances rendering quality, surpassing existing State-of-The-Art (SoTA) methods. Furthermore, we demonstrate the capability of our approach to effectively manage and render large scenes, such as urban environments, whilst maintaining high fidelity in the visual output.

Paper Structure

This paper contains 18 sections, 11 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: GaussianFocus. As illustrated by the red and yellow boxes in the images, our method consistently surpasses the 3DGS model in various scenes, showing distinct advantages in challenging environments characterized by slender geometries, intricate details, and lighting effects.
  • Figure 2: Overview of GaussianFocus: Our model monitors the size of Gaussian spheres during initialization and training. Constraints are applied to the scaling matrix $S$ within the covariance matrix to prevent the excessive growth of the Gaussian spheres. Subsequently, the rendered image is divided into 64 parts. Each part independently calculates its attention values, which are then concatenated to form a comprehensive attention map. This map is multiplied back onto the original rendered image to produce an attention-enhanced image. Finally, this enhanced image and the original rendered image undergo multiple loss calculations against the ground truth. These include reconstruction ($L_1$), structural similarity ($L_{D-SSIM}$), edge ($L_{Edge}$), and frequency ($L_{Frequency}$) losses.
  • Figure 3: Subdivision-Based Reconstruction of Large Scenes Procedure. Our method divides large scenes into blocks for reconstruction.
  • Figure 4: Qualitative Comparison Results on the Mip-NeRF 360 Dataset barron2022mip. These models were trained using images downsampled by a factor of eight and then rendered at full resolution to depict the quality of zooming in and close-ups. In contrast to previous approaches, our model achieves a higher level of accuracy and detail than other models and can render images that are almost identical to the ground truth.
  • Figure 5: Training Progression on the Villa Dataset. We present the quality of the reconstructed villa scene at different training iterations. Compared to the SoTA Mip-Splatting yu2024mip, our method not only converges faster but also achieves better reconstruction quality with less noise.
  • ...and 2 more figures