Table of Contents
Fetching ...

Neo: Real-Time On-Device 3D Gaussian Splatting with Reuse-and-Update Sorting Acceleration

Changhun Oh, Seongryong Oh, Jinwoo Hwang, Yoonsung Kim, Hardik Sharma, Jongse Park

TL;DR

This work tackles the bottleneck of sorting in real-time on-device 3D Gaussian Splatting (3DGS) by introducing Neo, a hardware-software co-design that leverages temporal redundancy through reuse-and-update sorting. The software flow performs incremental updates to the Gaussian ordering across frames, while the hardware accelerator accelerates both sorting and rasterization with a tile- and subtile-based pipeline. Neo achieves up to 10x throughput improvements over an edge GPU and 5.6x over GSCore, and dramatically reduces DRAM traffic by up to 94.4%, enabling real-time rendering at QHD resolutions on-device. The results demonstrate significant practicality gains for immersive AR/VR experiences, moving toward truly immersive on-device generative virtual worlds.

Abstract

3D Gaussian Splatting (3DGS) rendering in real-time on resource-constrained devices is essential for delivering immersive augmented and virtual reality (AR/VR) experiences. However, existing solutions struggle to achieve high frame rates, especially for high-resolution rendering. Our analysis identifies the sorting stage in the 3DGS rendering pipeline as the major bottleneck due to its high memory bandwidth demand. This paper presents Neo, which introduces a reuse-and-update sorting algorithm that exploits temporal redundancy in Gaussian ordering across consecutive frames, and devises a hardware accelerator optimized for this algorithm. By efficiently tracking and updating Gaussian depth ordering instead of re-sorting from scratch, Neo significantly reduces redundant computations and memory bandwidth pressure. Experimental results show that Neo achieves up to 10.0x and 5.6x higher throughput than state-of-the-art edge GPU and ASIC solution, respectively, while reducing DRAM traffic by 94.5% and 81.3%. These improvements make high-quality and low-latency on-device 3D rendering more practical.

Neo: Real-Time On-Device 3D Gaussian Splatting with Reuse-and-Update Sorting Acceleration

TL;DR

This work tackles the bottleneck of sorting in real-time on-device 3D Gaussian Splatting (3DGS) by introducing Neo, a hardware-software co-design that leverages temporal redundancy through reuse-and-update sorting. The software flow performs incremental updates to the Gaussian ordering across frames, while the hardware accelerator accelerates both sorting and rasterization with a tile- and subtile-based pipeline. Neo achieves up to 10x throughput improvements over an edge GPU and 5.6x over GSCore, and dramatically reduces DRAM traffic by up to 94.4%, enabling real-time rendering at QHD resolutions on-device. The results demonstrate significant practicality gains for immersive AR/VR experiences, moving toward truly immersive on-device generative virtual worlds.

Abstract

3D Gaussian Splatting (3DGS) rendering in real-time on resource-constrained devices is essential for delivering immersive augmented and virtual reality (AR/VR) experiences. However, existing solutions struggle to achieve high frame rates, especially for high-resolution rendering. Our analysis identifies the sorting stage in the 3DGS rendering pipeline as the major bottleneck due to its high memory bandwidth demand. This paper presents Neo, which introduces a reuse-and-update sorting algorithm that exploits temporal redundancy in Gaussian ordering across consecutive frames, and devises a hardware accelerator optimized for this algorithm. By efficiently tracking and updating Gaussian depth ordering instead of re-sorting from scratch, Neo significantly reduces redundant computations and memory bandwidth pressure. Experimental results show that Neo achieves up to 10.0x and 5.6x higher throughput than state-of-the-art edge GPU and ASIC solution, respectively, while reducing DRAM traffic by 94.5% and 81.3%. These improvements make high-quality and low-latency on-device 3D rendering more practical.

Paper Structure

This paper contains 27 sections, 1 equation, 19 figures, 4 tables, 1 algorithm.

Figures (19)

  • Figure 1: Reuse opportunities in the sorting stage of 3D Gaussian Splatting (3DGS) inference. The figure illustrates how the Gaussian order across three consecutive frames (F1, F2, and F3) exhibits significant temporal similarities.
  • Figure 2: Brief overview of 3D Gaussian Splatting.
  • Figure 3: Throughput comparison with different resolutions.
  • Figure 4: Throughput comparison across core counts and DRAM bandwidth when rendering at QHD resolution. Each colored label denotes the corresponding FPS performance.
  • Figure 5: DRAM traffic (GB) required for rendering 60 frames and breakdown of memory bandwidth consumption across 3DGS pipeline stages.
  • ...and 14 more figures