GaussianFocus: Constrained Attention Focus for 3D Gaussian Splatting
Zexu Huang, Min Xu, Stuart Perry
TL;DR
GaussianFocus addresses the redundancy and scalability issues of 3D Gaussian Splatting by introducing a patch-based attention mechanism, Gaussian Sphere Constraints to curb oversized Gaussians, and a subdivision-based reconstruction strategy for large scenes. The method couples a patch attention module with edge and frequency losses, and enforces density-driven constraints on Gaussian sizes, enabling high-fidelity reconstructions and efficient handling of urban-scale environments. Key contributions include 3D Gaussian-Based Patch Attention, Gaussian Sphere Constraints, and Subdivision-Based Reconstruction, which together yield superior rendering quality and scalability compared to state-of-the-art baselines on multiple datasets. The approach significantly reduces artifacts such as air walls, improves detail capture at edges, and enables direct training on large-scale scenes with reduced memory and compute requirements, making it practical for city-scale reconstructions.
Abstract
Recent developments in 3D reconstruction and neural rendering have significantly propelled the capabilities of photo-realistic 3D scene rendering across various academic and industrial fields. The 3D Gaussian Splatting technique, alongside its derivatives, integrates the advantages of primitive-based and volumetric representations to deliver top-tier rendering quality and efficiency. Despite these advancements, the method tends to generate excessive redundant noisy Gaussians overfitted to every training view, which degrades the rendering quality. Additionally, while 3D Gaussian Splatting excels in small-scale and object-centric scenes, its application to larger scenes is hindered by constraints such as limited video memory, excessive optimization duration, and variable appearance across views. To address these challenges, we introduce GaussianFocus, an innovative approach that incorporates a patch attention algorithm to refine rendering quality and implements a Gaussian constraints strategy to minimize redundancy. Moreover, we propose a subdivision reconstruction strategy for large-scale scenes, dividing them into smaller, manageable blocks for individual training. Our results indicate that GaussianFocus significantly reduces unnecessary Gaussians and enhances rendering quality, surpassing existing State-of-The-Art (SoTA) methods. Furthermore, we demonstrate the capability of our approach to effectively manage and render large scenes, such as urban environments, whilst maintaining high fidelity in the visual output.
