Table of Contents
Fetching ...

GSta: Efficient Training Scheme with Siestaed Gaussians for Monocular 3D Scene Reconstruction

Anil Armagan, Albert Saà-Garriga, Bruno Manganelli, Kyuwon Kim, M. Kerim Yucel

TL;DR

GSta addresses efficiency bottlenecks in Gaussian Splatting for monocular 3D reconstruction by introducing a gradient-driven freezing scheme that identifies converged Gaussians via joint xyz and rgb gradient norms. It integrates early stopping based on a training-subset PSNR criterion, a plateau-based learning-rate scheduler, and per-splat rasterizer/optimizer changes, plus a bag of orthogonal tricks to further reduce training cost. The method serves as a plug-in enhancement for GS methods and substantially improves the Pareto front in training time, storage, and peak memory while maintaining competitive rendering quality; when combined with Trick-GS, it achieves up to ~5x faster training and ~5x smaller storage, with additional memory savings and potential 16x storage reductions in compact variants. Evaluations on Mip-NeRF-360, Tanks&Temples, and DeepBlending demonstrate its effectiveness and compatibility with other efficiency techniques across diverse scenes.

Abstract

Gaussian Splatting (GS) is a popular approach for 3D reconstruction, mostly due to its ability to converge reasonably fast, faithfully represent the scene and render (novel) views in a fast fashion. However, it suffers from large storage and memory requirements, and its training speed still lags behind the hash-grid based radiance field approaches (e.g. Instant-NGP), which makes it especially difficult to deploy them in robotics scenarios, where 3D reconstruction is crucial for accurate operation. In this paper, we propose GSta that dynamically identifies Gaussians that have converged well during training, based on their positional and color gradient norms. By forcing such Gaussians into a siesta and stopping their updates (freezing) during training, we improve training speed with competitive accuracy compared to state of the art. We also propose an early stopping mechanism based on the PSNR values computed on a subset of training images. Combined with other improvements, such as integrating a learning rate scheduler, GSta achieves an improved Pareto front in convergence speed, memory and storage requirements, while preserving quality. We also show that GSta can improve other methods and complement orthogonal approaches in efficiency improvement; once combined with Trick-GS, GSta achieves up to 5x faster training, 16x smaller disk size compared to vanilla GS, while having comparable accuracy and consuming only half the peak memory. More visualisations are available at https://anilarmagan.github.io/SRUK-GSta.

GSta: Efficient Training Scheme with Siestaed Gaussians for Monocular 3D Scene Reconstruction

TL;DR

GSta addresses efficiency bottlenecks in Gaussian Splatting for monocular 3D reconstruction by introducing a gradient-driven freezing scheme that identifies converged Gaussians via joint xyz and rgb gradient norms. It integrates early stopping based on a training-subset PSNR criterion, a plateau-based learning-rate scheduler, and per-splat rasterizer/optimizer changes, plus a bag of orthogonal tricks to further reduce training cost. The method serves as a plug-in enhancement for GS methods and substantially improves the Pareto front in training time, storage, and peak memory while maintaining competitive rendering quality; when combined with Trick-GS, it achieves up to ~5x faster training and ~5x smaller storage, with additional memory savings and potential 16x storage reductions in compact variants. Evaluations on Mip-NeRF-360, Tanks&Temples, and DeepBlending demonstrate its effectiveness and compatibility with other efficiency techniques across diverse scenes.

Abstract

Gaussian Splatting (GS) is a popular approach for 3D reconstruction, mostly due to its ability to converge reasonably fast, faithfully represent the scene and render (novel) views in a fast fashion. However, it suffers from large storage and memory requirements, and its training speed still lags behind the hash-grid based radiance field approaches (e.g. Instant-NGP), which makes it especially difficult to deploy them in robotics scenarios, where 3D reconstruction is crucial for accurate operation. In this paper, we propose GSta that dynamically identifies Gaussians that have converged well during training, based on their positional and color gradient norms. By forcing such Gaussians into a siesta and stopping their updates (freezing) during training, we improve training speed with competitive accuracy compared to state of the art. We also propose an early stopping mechanism based on the PSNR values computed on a subset of training images. Combined with other improvements, such as integrating a learning rate scheduler, GSta achieves an improved Pareto front in convergence speed, memory and storage requirements, while preserving quality. We also show that GSta can improve other methods and complement orthogonal approaches in efficiency improvement; once combined with Trick-GS, GSta achieves up to 5x faster training, 16x smaller disk size compared to vanilla GS, while having comparable accuracy and consuming only half the peak memory. More visualisations are available at https://anilarmagan.github.io/SRUK-GSta.

Paper Structure

This paper contains 15 sections, 7 equations, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: Mean gradient magnitude of Gaussians during 30000 iterations of training. Training takes $6\times$ longer since hard-coded number of training iterations in 3DGS kerbl3Dgaussians, while a high number of Gaussians are converged in early iterations. Please note the scale difference between the positional and rest of the parameters.
  • Figure 2: Our proposed GSta training strategy. During training, we observe the gradients of Gaussian parameters (position xyz and color rgb used in practice) to decide which Gaussians have converged. We then progressively freeze (e.g. stop training) converged Gaussians, until we either hit our proposed training-set based early stopping criteria, or finish training. We then unfreeze all Gaussians and finetune them for a few iterations for global alignment. GSta leads to reduced training time, disk size and peak memory consumption.
  • Figure 3: Qualitative comparison of the methods (top left to bottom right: 3DGS kerbl3Dgaussians, Mini-Splatting-v2 fang2024mini2, Taming-GS mallick2024taming3dgs and ours). Our method can recover more consistent background while keeping low training time and great storage compression rates. We show zoomed prediction images except GT.