Table of Contents
Fetching ...

Optimizing 3D Gaussian Splattering for Mobile GPUs

Md Musfiqur Rahman Sanim, Zhihao Shu, Bahram Afsharmanesh, AmirAli Mirian, Jiexiong Guan, Wei Niu, Bin Ren, Gagan Agrawal

TL;DR

This work targets real-time 3D scene reconstruction on mobile GPUs by optimizing 3D Gaussian Splatting (3DGS) through Texture3dgs, a pipeline tuned for 2D texture memory. The central contribution is a texture-memory aware sorting kernel and data-layout strategies, including a layout transformation and stage fusion, complemented by variable packing and tile-based rendering. Empirical results show up to 4.1× speedups in sorting and up to 1.7× end-to-end improvements, with memory usage reduced up to 1.6×, demonstrating practical applicability on resource-constrained mobile devices. The approach enables privacy-preserving, offline, latency-sensitive 3D reconstruction suitable for AR, robotics, and autonomous systems on mobile hardware.

Abstract

Image-based 3D scene reconstruction, which transforms multi-view images into a structured 3D representation of the surrounding environment, is a common task across many modern applications. 3D Gaussian Splatting (3DGS) is a new paradigm to address this problem and offers considerable efficiency as compared to the previous methods. Motivated by this, and considering various benefits of mobile device deployment (data privacy, operating without internet connectivity, and potentially faster responses), this paper develops Texture3dgs, an optimized mapping of 3DGS for a mobile GPU. A critical challenge in this area turns out to be optimizing for the two-dimensional (2D) texture cache, which needs to be exploited for faster executions on mobile GPUs. As a sorting method dominates the computations in 3DGS on mobile platforms, the core of Texture3dgs is a novel sorting algorithm where the processing, data movement, and placement are highly optimized for 2D memory. The properties of this algorithm are analyzed in view of a cost model for the texture cache. In addition, we accelerate other steps of the 3DGS algorithm through improved variable layout design and other optimizations. End-to-end evaluation shows that Texture3dgs delivers up to 4.1$\times$ and 1.7$\times$ speedup for the sorting and overall 3D scene reconstruction, respectively -- while also reducing memory usage by up to 1.6$\times$ -- demonstrating the effectiveness of our design for efficient mobile 3D scene reconstruction.

Optimizing 3D Gaussian Splattering for Mobile GPUs

TL;DR

This work targets real-time 3D scene reconstruction on mobile GPUs by optimizing 3D Gaussian Splatting (3DGS) through Texture3dgs, a pipeline tuned for 2D texture memory. The central contribution is a texture-memory aware sorting kernel and data-layout strategies, including a layout transformation and stage fusion, complemented by variable packing and tile-based rendering. Empirical results show up to 4.1× speedups in sorting and up to 1.7× end-to-end improvements, with memory usage reduced up to 1.6×, demonstrating practical applicability on resource-constrained mobile devices. The approach enables privacy-preserving, offline, latency-sensitive 3D reconstruction suitable for AR, robotics, and autonomous systems on mobile hardware.

Abstract

Image-based 3D scene reconstruction, which transforms multi-view images into a structured 3D representation of the surrounding environment, is a common task across many modern applications. 3D Gaussian Splatting (3DGS) is a new paradigm to address this problem and offers considerable efficiency as compared to the previous methods. Motivated by this, and considering various benefits of mobile device deployment (data privacy, operating without internet connectivity, and potentially faster responses), this paper develops Texture3dgs, an optimized mapping of 3DGS for a mobile GPU. A critical challenge in this area turns out to be optimizing for the two-dimensional (2D) texture cache, which needs to be exploited for faster executions on mobile GPUs. As a sorting method dominates the computations in 3DGS on mobile platforms, the core of Texture3dgs is a novel sorting algorithm where the processing, data movement, and placement are highly optimized for 2D memory. The properties of this algorithm are analyzed in view of a cost model for the texture cache. In addition, we accelerate other steps of the 3DGS algorithm through improved variable layout design and other optimizations. End-to-end evaluation shows that Texture3dgs delivers up to 4.1 and 1.7 speedup for the sorting and overall 3D scene reconstruction, respectively -- while also reducing memory usage by up to 1.6 -- demonstrating the effectiveness of our design for efficient mobile 3D scene reconstruction.

Paper Structure

This paper contains 21 sections, 15 figures, 6 tables, 3 algorithms.

Figures (15)

  • Figure 1: 3DGS rendering pipelines.
  • Figure 2: Mobile GPU memory hierarchy.
  • Figure 3: Bitonic sorting network illustration: (a) comparator operations, where two elements are compared and swapped based on their order, and (b) sorting process across multiple steps, where at each stage, the array is conceptually partitioned into sorted segments (length of $2^{step}$). The arrows indicate the comparisons and exchanges at each step.
  • Figure 4: GPUTeraSort's sort a) stage 2, step 2 and b) stage 2, step 1 -- the quad sizes are 4 and 2, respectively. During each comparison step, a value from the green region is paired with its corresponding value from the yellow region. After comparison, the minimum and maximum values are placed in the corresponding green and yellow positions, respectively.
  • Figure 5: GPUTeraSort’s texture memory allocation for different quad sizes: $b$ is the texture cache block size, $B$ is the quad size, $W$ and $H$ are texture dimensions. This figure shows three cases: (a) quad is along a single dimension, with length less than $b$, (b) the quad is along a single dimension, with length greater than $b$, and (c) the quad is two-dimensional with length $H$, and width $B/H$.
  • ...and 10 more figures