Table of Contents
Fetching ...

Holistic Large-Scale Scene Reconstruction via Mixed Gaussian Splatting

Chuandong Liu, Huijiao Wang, Lei Yu, Gui-Song Xia

TL;DR

MixGS tackles large-scale 3D scene reconstruction by replacing divide-and-conquer block training with a holistic optimization that jointly learns a coarse global Gaussian prior and view-aware refinements. It encodes visible Gaussians using a multi-resolution hash representation, decodes enriched Gaussians via a lightweight MLP, and mixes decoded Gaussians with the original to preserve global coherence and restore fine details. A three-stage training schedule optimizes global structure first, then local detail, and finally jointly refines both, enabling high-quality rendering on a single 24GB GPU with real-time performance. Extensive experiments on UrbanScene3D and Mill19 show state-of-the-art SSIM and competitive PSNR/LPIPS, with robust ablations confirming the importance of hash encoding, auxiliary features, offset pooling, and Gaussian mixing for global-local fidelity.

Abstract

Recent advances in 3D Gaussian Splatting have shown remarkable potential for novel view synthesis. However, most existing large-scale scene reconstruction methods rely on the divide-and-conquer paradigm, which often leads to the loss of global scene information and requires complex parameter tuning due to scene partitioning and local optimization. To address these limitations, we propose MixGS, a novel holistic optimization framework for large-scale 3D scene reconstruction. MixGS models the entire scene holistically by integrating camera pose and Gaussian attributes into a view-aware representation, which is decoded into fine-detailed Gaussians. Furthermore, a novel mixing operation combines decoded and original Gaussians to jointly preserve global coherence and local fidelity. Extensive experiments on large-scale scenes demonstrate that MixGS achieves state-of-the-art rendering quality and competitive speed, while significantly reducing computational requirements, enabling large-scale scene reconstruction training on a single 24GB VRAM GPU. The code will be released at https://github.com/azhuantou/MixGS.

Holistic Large-Scale Scene Reconstruction via Mixed Gaussian Splatting

TL;DR

MixGS tackles large-scale 3D scene reconstruction by replacing divide-and-conquer block training with a holistic optimization that jointly learns a coarse global Gaussian prior and view-aware refinements. It encodes visible Gaussians using a multi-resolution hash representation, decodes enriched Gaussians via a lightweight MLP, and mixes decoded Gaussians with the original to preserve global coherence and restore fine details. A three-stage training schedule optimizes global structure first, then local detail, and finally jointly refines both, enabling high-quality rendering on a single 24GB GPU with real-time performance. Extensive experiments on UrbanScene3D and Mill19 show state-of-the-art SSIM and competitive PSNR/LPIPS, with robust ablations confirming the importance of hash encoding, auxiliary features, offset pooling, and Gaussian mixing for global-local fidelity.

Abstract

Recent advances in 3D Gaussian Splatting have shown remarkable potential for novel view synthesis. However, most existing large-scale scene reconstruction methods rely on the divide-and-conquer paradigm, which often leads to the loss of global scene information and requires complex parameter tuning due to scene partitioning and local optimization. To address these limitations, we propose MixGS, a novel holistic optimization framework for large-scale 3D scene reconstruction. MixGS models the entire scene holistically by integrating camera pose and Gaussian attributes into a view-aware representation, which is decoded into fine-detailed Gaussians. Furthermore, a novel mixing operation combines decoded and original Gaussians to jointly preserve global coherence and local fidelity. Extensive experiments on large-scale scenes demonstrate that MixGS achieves state-of-the-art rendering quality and competitive speed, while significantly reducing computational requirements, enabling large-scale scene reconstruction training on a single 24GB VRAM GPU. The code will be released at https://github.com/azhuantou/MixGS.

Paper Structure

This paper contains 15 sections, 12 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Left: Conventional divide-and-conquer approaches partition the scene into multiple independent blocks, each optimized and rendered separately before merging. This strategy introduces two major limitations: (1) complex parameter tuning, such as selecting the number of blocks and adjusting thresholds (e.g., visibility and intersection lin2024vastgaussianliu2024citygaussian), which often requires extensive manual intervention and scene-specific reconfiguration; (2) loss of global information, leading to inconsistencies in global structure, illumination, and geometric continuity across block boundaries, as local optimizations do not guarantee global coherence. Right: In contrast, our MixGS framework treats the entire scene and the Gaussian Representation network as a holistic optimization problem. By extracting and encoding visible Gaussians, decoding new Gaussians via implicit feature representations, and mixing them with the originals, MixGS achieves both global consistency and fine-grained detail reconstruction.
  • Figure 2: Overview of the proposed MixGS method pipeline. We first train the original Gaussians to capture the coarse information of the scene. Then, based on the camera poses, we extract the Gaussians within the view frustum, thereby enabling implicit feature extraction that integrates Gaussian attributes with camera poses. These features are decoded using a tiny multi-head MLP to generate the decoded Gaussians. Next, the positions of the Gaussian primitives are adjusted via an offset pool and combined with the original Gaussians to form the mixed Gaussians. Finally, the mixed Gaussians are splatted through the differentiable rasterization operation gs to render images for supervision.
  • Figure 3: Qualitative comparison of rendering on the Mill19 turki2022mega and UrbanScene3D u3d datasets. We demonstrate the zoomed-in detail of the selected areas in red box.
  • Figure 4: Visualizations of the Gaussians in the view frustum, decoded Gaussians, and mixed Gaussians, along with their corresponding rendered images. It can be observed that rendering with mixed Gaussians leads to better reconstruction of lighting consistency and fine-grained local details.
  • Figure 5: Qualitative results of ours and other methods in image rendering on Mill-19 turki2022mega and Urbanscene3D u3d datasets.
  • ...and 1 more figures