Table of Contents
Fetching ...

Vorion: A RISC-V GPU with Hardware-Accelerated 3D Gaussian Rendering and Training

Yipeng Wang, Mengtian Yang, Chieh-pu Lo, Jaydeep P. Kulkarni

TL;DR

Vorion targets real-time 3D Gaussian Splatting (3DGS) by introducing hardware-accelerated rendering and training via a unified Gaussian rasterizer. The key ideas are large-tile processing, z-tiling for depth-parallelism, and a hybrid Gaussian/pixel dataflow to mitigate alpha blending and gradient accumulation bottlenecks. Hardware experiments on a 16 nm prototype show 19 FPS rendering and 38.6 iterations/s training in a scaled configuration, with up to 152 FPS rendering and 38.6 iterations/s training as resources scale, indicating strong near-linear scalability. These results suggest real-time 3DGS is feasible on next-generation GPUs, enabling edge AR/VR, robotics, and dynamic scene capture.

Abstract

3D Gaussian Splatting (3DGS) has recently emerged as a foundational technique for real-time neural rendering, 3D scene generation, volumetric video (4D) capture. However, its rendering and training impose massive computation, making real-time rendering on edge devices and real-time 4D reconstruction on workstations currently infeasible. Given its fixed-function nature and similarity with traditional rasterization, 3DGS presents a strong case for dedicated hardware in the graphics pipeline of next-generation GPUs. This work, Vorion, presents the first GPGPU prototype with hardware-accelerated 3DGS rendering and training. Vorion features scalable architecture, minimal hardware change to traditional rasterizers, z-tiling to increase parallelism, and Gaussian/pixel-centric hybrid dataflow. We prototype the minimal system (8 SIMT cores, 2 Gaussian rasterizer) using TSMC 16nm FinFET technology, which achieves 19 FPS for rendering. The scaled design with 16 rasterizers achieves 38.6 iterations/s for training.

Vorion: A RISC-V GPU with Hardware-Accelerated 3D Gaussian Rendering and Training

TL;DR

Vorion targets real-time 3D Gaussian Splatting (3DGS) by introducing hardware-accelerated rendering and training via a unified Gaussian rasterizer. The key ideas are large-tile processing, z-tiling for depth-parallelism, and a hybrid Gaussian/pixel dataflow to mitigate alpha blending and gradient accumulation bottlenecks. Hardware experiments on a 16 nm prototype show 19 FPS rendering and 38.6 iterations/s training in a scaled configuration, with up to 152 FPS rendering and 38.6 iterations/s training as resources scale, indicating strong near-linear scalability. These results suggest real-time 3DGS is feasible on next-generation GPUs, enabling edge AR/VR, robotics, and dynamic scene capture.

Abstract

3D Gaussian Splatting (3DGS) has recently emerged as a foundational technique for real-time neural rendering, 3D scene generation, volumetric video (4D) capture. However, its rendering and training impose massive computation, making real-time rendering on edge devices and real-time 4D reconstruction on workstations currently infeasible. Given its fixed-function nature and similarity with traditional rasterization, 3DGS presents a strong case for dedicated hardware in the graphics pipeline of next-generation GPUs. This work, Vorion, presents the first GPGPU prototype with hardware-accelerated 3DGS rendering and training. Vorion features scalable architecture, minimal hardware change to traditional rasterizers, z-tiling to increase parallelism, and Gaussian/pixel-centric hybrid dataflow. We prototype the minimal system (8 SIMT cores, 2 Gaussian rasterizer) using TSMC 16nm FinFET technology, which achieves 19 FPS for rendering. The scaled design with 16 rasterizers achieves 38.6 iterations/s for training.

Paper Structure

This paper contains 13 sections, 9 equations, 9 figures.

Figures (9)

  • Figure 1: Rendeing and training runtime breakdown on edge and server GPUs; Total Gaussian Invocations v.s. tile size ; Fraction of occluded pixels v.s. blending progress (depth).
  • Figure 2: 3D Gaussian Splatting rendering and training pipeline.
  • Figure 3: Overall Vorion GPGPU architecture.
  • Figure 4: Traditional rasterizer architecture; Gaussian rasterizer architecture in rendering setup; Dataflow chart for rendering.
  • Figure 5: Gaussian rasterizer architecture in training setup; Training dataflow chart.
  • ...and 4 more figures