Table of Contents
Fetching ...

RetinaGS: Scalable Training for Dense Scene Rendering with Billion-Scale 3D Gaussians

Bingling Li, Shengyi Chen, Luchao Wang, Kaimin Liao, Sijie Yan, Yuanjun Xiong

TL;DR

RetinaGS tackles the scalability bottleneck of dense 3D Gaussian Splatting by introducing a model-parallel training framework that preserves the single-GPU rendering equation across distributed workers. It partitions the scene into overlapping convex subspaces via a KD-tree, computes subset-level colors and opacities in parallel, and merges them in the correct order to reproduce the full rendering result, enabling training with billions of primitives on multi-GPU clusters. The approach demonstrates consistent gains in rendering quality (PSNR/SSIM/LPIPS) as primitive counts increase and as training resolution and data scale rise, with a pioneering billion-primitive model trained on MatrixCity-ALL. The work also analyzes partitioning strategies and demonstrates substantial memory and throughput benefits of model parallelism over data parallelism, while acknowledging remaining challenges in load balancing and initialization throughput for future improvements.

Abstract

In this work, we explore the possibility of training high-parameter 3D Gaussian splatting (3DGS) models on large-scale, high-resolution datasets. We design a general model parallel training method for 3DGS, named RetinaGS, which uses a proper rendering equation and can be applied to any scene and arbitrary distribution of Gaussian primitives. It enables us to explore the scaling behavior of 3DGS in terms of primitive numbers and training resolutions that were difficult to explore before and surpass previous state-of-the-art reconstruction quality. We observe a clear positive trend of increasing visual quality when increasing primitive numbers with our method. We also demonstrate the first attempt at training a 3DGS model with more than one billion primitives on the full MatrixCity dataset that attains a promising visual quality.

RetinaGS: Scalable Training for Dense Scene Rendering with Billion-Scale 3D Gaussians

TL;DR

RetinaGS tackles the scalability bottleneck of dense 3D Gaussian Splatting by introducing a model-parallel training framework that preserves the single-GPU rendering equation across distributed workers. It partitions the scene into overlapping convex subspaces via a KD-tree, computes subset-level colors and opacities in parallel, and merges them in the correct order to reproduce the full rendering result, enabling training with billions of primitives on multi-GPU clusters. The approach demonstrates consistent gains in rendering quality (PSNR/SSIM/LPIPS) as primitive counts increase and as training resolution and data scale rise, with a pioneering billion-primitive model trained on MatrixCity-ALL. The work also analyzes partitioning strategies and demonstrates substantial memory and throughput benefits of model parallelism over data parallelism, while acknowledging remaining challenges in load balancing and initialization throughput for future improvements.

Abstract

In this work, we explore the possibility of training high-parameter 3D Gaussian splatting (3DGS) models on large-scale, high-resolution datasets. We design a general model parallel training method for 3DGS, named RetinaGS, which uses a proper rendering equation and can be applied to any scene and arbitrary distribution of Gaussian primitives. It enables us to explore the scaling behavior of 3DGS in terms of primitive numbers and training resolutions that were difficult to explore before and surpass previous state-of-the-art reconstruction quality. We observe a clear positive trend of increasing visual quality when increasing primitive numbers with our method. We also demonstrate the first attempt at training a 3DGS model with more than one billion primitives on the full MatrixCity dataset that attains a promising visual quality.
Paper Structure (25 sections, 13 equations, 11 figures, 13 tables)

This paper contains 25 sections, 13 equations, 11 figures, 13 tables.

Figures (11)

  • Figure 1: Left: Different sizes of datasets require varying levels of computational power and numbers of 3DGS Primitives. Larger and higher-resolution datasets can no longer be trained using just a single GPU, which limits the pursuit of scale and fidelity in 3DGS reconstruction. Right: The billion-level model bring better visual experience than million-level model on MatrixCity-ALL dataset, which is trained via our distributed modeling with 64 GPUs.
  • Figure 2: By employing planes generated using KD-Tree, we spatially partitioned the initial 3DGS model to a set of sub-models. These sub-models share certain primitives only when these primitives cross boundaries. The rendering results of sub-models are then merged to form the final rendered image. After the loss is computed uniformly, the corresponding gradients are returned to each sub-model to update their primitive parameters.
  • Figure 3: PSNR vs. #GS analysis on various datasets. The markers represent 3DGS baseline, and the curves denote our method. Results at various training resolutions are upsampled and evaluated at the full resolution to unify the metrics. Our method achieves superior PSNR simply by naively scale the model size. This trend is more pronounced on the high-fidelity dataset ScanNet++ and larger-scale dataset MatrixCity-M.
  • Figure 4: Visualization of models and with various number of primitives and training resolution on Garden and ScanNet++ dataset (top and bottom metrics: training resolution/splats count/PSNR). As we get close to objects or zoom into camera, higher training resolutions and more primitives help maintain rendering clarity and reveal more details, which bring better visual experience and better quantitative results than the 3DGS baseline.
  • Figure 5: Comparative Analysis of Desification Strategies. The curves denote our MVS initialization, showing a clear positive correlation between the number of primitives and visual quality. Baseline results from the 3DGS method with its default densification threshold of 0.0002 are marked as black dots. Lower thresholds were tested to assess their impact on point densification compared to the default setting. The findings suggest that reducing the threshold does not consistently increase model size and fails to outperform our method.
  • ...and 6 more figures