Table of Contents
Fetching ...

FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering

Guofeng Feng, Siyan Chen, Rong Fu, Zimu Liao, Yi Wang, Tao Liu, Zhilin Pei, Hengjie Li, Xingcheng Zhang, Bo Dai

TL;DR

<3-5 sentence high-level summary> FlashGS addresses the bottlenecks of real-time rendering with 3D Gaussian Splatting on consumer GPUs by introducing a precise per-tile intersection algorithm, adaptive scheduling, and comprehensive memory and instruction-level optimizations. The approach balances compute and memory across preprocess and render stages, reduces redundant Gaussian-tile work, and lowers memory footprint without sacrificing image quality. Extensive evaluation shows FlashGS delivering up to 4x speedups (and up to ~30x on 4K matrices) with substantial memory savings across a range of real-world scenes, enabling real-time rendering of large-scale, high-resolution environments. The work contributes a practical, open-source CUDA Python pipeline and demonstrates substantial practical impact for real-time scene exploration and related applications.

Abstract

This work introduces FlashGS, an open-source CUDA Python library, designed to facilitate the efficient differentiable rasterization of 3D Gaussian Splatting through algorithmic and kernel-level optimizations. FlashGS is developed based on the observations from a comprehensive analysis of the rendering process to enhance computational efficiency and bring the technique to wide adoption. The paper includes a suite of optimization strategies, encompassing redundancy elimination, efficient pipelining, refined control and scheduling mechanisms, and memory access optimizations, all of which are meticulously integrated to amplify the performance of the rasterization process. An extensive evaluation of FlashGS' performance has been conducted across a diverse spectrum of synthetic and real-world large-scale scenes, encompassing a variety of image resolutions. The empirical findings demonstrate that FlashGS consistently achieves an average 4x acceleration over mobile consumer GPUs, coupled with reduced memory consumption. These results underscore the superior performance and resource optimization capabilities of FlashGS, positioning it as a formidable tool in the domain of 3D rendering.

FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering

TL;DR

<3-5 sentence high-level summary> FlashGS addresses the bottlenecks of real-time rendering with 3D Gaussian Splatting on consumer GPUs by introducing a precise per-tile intersection algorithm, adaptive scheduling, and comprehensive memory and instruction-level optimizations. The approach balances compute and memory across preprocess and render stages, reduces redundant Gaussian-tile work, and lowers memory footprint without sacrificing image quality. Extensive evaluation shows FlashGS delivering up to 4x speedups (and up to ~30x on 4K matrices) with substantial memory savings across a range of real-world scenes, enabling real-time rendering of large-scale, high-resolution environments. The work contributes a practical, open-source CUDA Python pipeline and demonstrates substantial practical impact for real-time scene exploration and related applications.

Abstract

This work introduces FlashGS, an open-source CUDA Python library, designed to facilitate the efficient differentiable rasterization of 3D Gaussian Splatting through algorithmic and kernel-level optimizations. FlashGS is developed based on the observations from a comprehensive analysis of the rendering process to enhance computational efficiency and bring the technique to wide adoption. The paper includes a suite of optimization strategies, encompassing redundancy elimination, efficient pipelining, refined control and scheduling mechanisms, and memory access optimizations, all of which are meticulously integrated to amplify the performance of the rasterization process. An extensive evaluation of FlashGS' performance has been conducted across a diverse spectrum of synthetic and real-world large-scale scenes, encompassing a variety of image resolutions. The empirical findings demonstrate that FlashGS consistently achieves an average 4x acceleration over mobile consumer GPUs, coupled with reduced memory consumption. These results underscore the superior performance and resource optimization capabilities of FlashGS, positioning it as a formidable tool in the domain of 3D rendering.
Paper Structure (40 sections, 7 equations, 17 figures, 2 tables, 4 algorithms)

This paper contains 40 sections, 7 equations, 17 figures, 2 tables, 4 algorithms.

Figures (17)

  • Figure 1: Two representative rendering output images with 3D Gaussian Splatting 3DGS and our FlashGS.
  • Figure 2: 3DGS Overview
  • Figure 3: Runtime breakdown of 3GDS rasterization on the MatrixCityli2023matrixcity dataset.
  • Figure 4: We evaluate the key-value pairs binning process from the rendering process of 6 frames in the scene trained from MatrixCityli2023matrixcity dataset. The number of assigned k-v pairs is much more than the number of tiles really covered by the AABB or the projected ellipse.
  • Figure 5: Geometry Redundancies. There are 3 kinds of redundancies in original 3DGS intersection algorithm: I. The definition of ellipse ignores the opacity. II. The AABB is over-estimated. III. The tiles out of the ellipse are binned with the Gaussian.
  • ...and 12 more figures