Optimizing 3D Gaussian Splattering for Mobile GPUs
Md Musfiqur Rahman Sanim, Zhihao Shu, Bahram Afsharmanesh, AmirAli Mirian, Jiexiong Guan, Wei Niu, Bin Ren, Gagan Agrawal
TL;DR
This work targets real-time 3D scene reconstruction on mobile GPUs by optimizing 3D Gaussian Splatting (3DGS) through Texture3dgs, a pipeline tuned for 2D texture memory. The central contribution is a texture-memory aware sorting kernel and data-layout strategies, including a layout transformation and stage fusion, complemented by variable packing and tile-based rendering. Empirical results show up to 4.1× speedups in sorting and up to 1.7× end-to-end improvements, with memory usage reduced up to 1.6×, demonstrating practical applicability on resource-constrained mobile devices. The approach enables privacy-preserving, offline, latency-sensitive 3D reconstruction suitable for AR, robotics, and autonomous systems on mobile hardware.
Abstract
Image-based 3D scene reconstruction, which transforms multi-view images into a structured 3D representation of the surrounding environment, is a common task across many modern applications. 3D Gaussian Splatting (3DGS) is a new paradigm to address this problem and offers considerable efficiency as compared to the previous methods. Motivated by this, and considering various benefits of mobile device deployment (data privacy, operating without internet connectivity, and potentially faster responses), this paper develops Texture3dgs, an optimized mapping of 3DGS for a mobile GPU. A critical challenge in this area turns out to be optimizing for the two-dimensional (2D) texture cache, which needs to be exploited for faster executions on mobile GPUs. As a sorting method dominates the computations in 3DGS on mobile platforms, the core of Texture3dgs is a novel sorting algorithm where the processing, data movement, and placement are highly optimized for 2D memory. The properties of this algorithm are analyzed in view of a cost model for the texture cache. In addition, we accelerate other steps of the 3DGS algorithm through improved variable layout design and other optimizations. End-to-end evaluation shows that Texture3dgs delivers up to 4.1$\times$ and 1.7$\times$ speedup for the sorting and overall 3D scene reconstruction, respectively -- while also reducing memory usage by up to 1.6$\times$ -- demonstrating the effectiveness of our design for efficient mobile 3D scene reconstruction.
