FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering
Guofeng Feng, Siyan Chen, Rong Fu, Zimu Liao, Yi Wang, Tao Liu, Zhilin Pei, Hengjie Li, Xingcheng Zhang, Bo Dai
TL;DR
<3-5 sentence high-level summary> FlashGS addresses the bottlenecks of real-time rendering with 3D Gaussian Splatting on consumer GPUs by introducing a precise per-tile intersection algorithm, adaptive scheduling, and comprehensive memory and instruction-level optimizations. The approach balances compute and memory across preprocess and render stages, reduces redundant Gaussian-tile work, and lowers memory footprint without sacrificing image quality. Extensive evaluation shows FlashGS delivering up to 4x speedups (and up to ~30x on 4K matrices) with substantial memory savings across a range of real-world scenes, enabling real-time rendering of large-scale, high-resolution environments. The work contributes a practical, open-source CUDA Python pipeline and demonstrates substantial practical impact for real-time scene exploration and related applications.
Abstract
This work introduces FlashGS, an open-source CUDA Python library, designed to facilitate the efficient differentiable rasterization of 3D Gaussian Splatting through algorithmic and kernel-level optimizations. FlashGS is developed based on the observations from a comprehensive analysis of the rendering process to enhance computational efficiency and bring the technique to wide adoption. The paper includes a suite of optimization strategies, encompassing redundancy elimination, efficient pipelining, refined control and scheduling mechanisms, and memory access optimizations, all of which are meticulously integrated to amplify the performance of the rasterization process. An extensive evaluation of FlashGS' performance has been conducted across a diverse spectrum of synthetic and real-world large-scale scenes, encompassing a variety of image resolutions. The empirical findings demonstrate that FlashGS consistently achieves an average 4x acceleration over mobile consumer GPUs, coupled with reduced memory consumption. These results underscore the superior performance and resource optimization capabilities of FlashGS, positioning it as a formidable tool in the domain of 3D rendering.
