Table of Contents
Fetching ...

Toy-GS: Assembling Local Gaussians for Precisely Rendering Large-Scale Free Camera Trajectories

Xiaohan Zhang, Zhenyu Sun, Yukui Qiu, Junyan Su, Qi Liu

TL;DR

Toy-GS tackles rendering large-scale free camera trajectories by adaptively partitioning the scene and camera set into regions aligned to camera poses, enabling per-region local Gaussians trained in parallel. It couples adaptive partitioning with Patchmatch-guided Gaussian placement and PPAC-based scale, followed by a local-global fusion that leverages regional texture for distant views. Across SCUTic and public datasets, Toy-GS achieves state-of-the-art rendering quality while significantly reducing GPU memory, with a PSNR improvement of $1.19$ dB and memory savings of about $7$ G over baselines. This approach enables high-fidelity rendering of uneven, large-scale trajectories in realistic scenes and offers a practical path toward scalable Gaussian splatting on large environments.

Abstract

Currently, 3D rendering for large-scale free camera trajectories, namely, arbitrary input camera trajectories, poses significant challenges: 1) The distribution and observation angles of the cameras are irregular, and various types of scenes are included in the free trajectories; 2) Processing the entire point cloud and all images at once for large-scale scenes requires a substantial amount of GPU memory. This paper presents a Toy-GS method for accurately rendering large-scale free camera trajectories. Specifically, we propose an adaptive spatial division approach for free trajectories to divide cameras and the sparse point cloud of the entire scene into various regions according to camera poses. Training each local Gaussian in parallel for each area enables us to concentrate on texture details and minimize GPU memory usage. Next, we use the multi-view constraint and position-aware point adaptive control (PPAC) to improve the rendering quality of texture details. In addition, our regional fusion approach combines local and global Gaussians to enhance rendering quality with an increasing number of divided areas. Extensive experiments have been carried out to confirm the effectiveness and efficiency of Toy-GS, leading to state-of-the-art results on two public large-scale datasets as well as our SCUTic dataset. Our proposal demonstrates an enhancement of 1.19 dB in PSNR and conserves 7 G of GPU memory when compared to various benchmarks.

Toy-GS: Assembling Local Gaussians for Precisely Rendering Large-Scale Free Camera Trajectories

TL;DR

Toy-GS tackles rendering large-scale free camera trajectories by adaptively partitioning the scene and camera set into regions aligned to camera poses, enabling per-region local Gaussians trained in parallel. It couples adaptive partitioning with Patchmatch-guided Gaussian placement and PPAC-based scale, followed by a local-global fusion that leverages regional texture for distant views. Across SCUTic and public datasets, Toy-GS achieves state-of-the-art rendering quality while significantly reducing GPU memory, with a PSNR improvement of dB and memory savings of about G over baselines. This approach enables high-fidelity rendering of uneven, large-scale trajectories in realistic scenes and offers a practical path toward scalable Gaussian splatting on large environments.

Abstract

Currently, 3D rendering for large-scale free camera trajectories, namely, arbitrary input camera trajectories, poses significant challenges: 1) The distribution and observation angles of the cameras are irregular, and various types of scenes are included in the free trajectories; 2) Processing the entire point cloud and all images at once for large-scale scenes requires a substantial amount of GPU memory. This paper presents a Toy-GS method for accurately rendering large-scale free camera trajectories. Specifically, we propose an adaptive spatial division approach for free trajectories to divide cameras and the sparse point cloud of the entire scene into various regions according to camera poses. Training each local Gaussian in parallel for each area enables us to concentrate on texture details and minimize GPU memory usage. Next, we use the multi-view constraint and position-aware point adaptive control (PPAC) to improve the rendering quality of texture details. In addition, our regional fusion approach combines local and global Gaussians to enhance rendering quality with an increasing number of divided areas. Extensive experiments have been carried out to confirm the effectiveness and efficiency of Toy-GS, leading to state-of-the-art results on two public large-scale datasets as well as our SCUTic dataset. Our proposal demonstrates an enhancement of 1.19 dB in PSNR and conserves 7 G of GPU memory when compared to various benchmarks.

Paper Structure

This paper contains 16 sections, 9 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Toy-GS improves rendering quality while reducing GPU memory consumption. Utilizing a scene in datasets for example: (a) The rendering result of the original Gaussian Splatting. (b) By incorporating the multi-view constraint and position-aware point adaptive control into 3DGS, we enhance the background and texture details' rendering precision, resulting in the improvement of 0.74 dB in PSNR. (c) We improve by 5.56 dB in PSNR while reducing 5 G of GPU memory by dividing the entire scene into three areas and training a local Gaussian for each region separately. We design a local-global rendering method that can fully utilize the texture information of the local Gaussians to enhance the rendering effect.
  • Figure 2: The pipeline of Toy-GS. Firstly, we adaptively divide cameras and the point cloud into multiple areas based on the camera poses to align the point cloud distribution with the distribution of camera poses in each region. Next, we use the multi-view constraint to enhance texture details' rendering and the position-aware point adaptive control to improve distant objects' rendering, leading to better overall results. Finally, we develop the local-global rendering that utilizes both local and global Gaussians effectively to improve rendering accuracy and reduce GPU memory consumption as the number of areas increases.
  • Figure 3: Adaptive camera selection strategy in an area. For each camera, we project all points onto the imaging plane and filter out points that are not visible to this camera. We then determine if there are enough points visible to the camera within this area to decide whether to select this camera.
  • Figure 4: Comparison of different rendering methods. VastGaussian prunes each region's 3DGS, leading to holes in the rendering process. When there are a few viewpoints, training the entire scene with 3DGS outperforms VastGaussian. Our approach maximizes the use of local Gaussians to enhance rendering quality.
  • Figure 5: Visual comparisons with recent methods on our SCUTic dataset. Our method provides better rendering effects for texture details and distant objects.
  • ...and 4 more figures