Table of Contents
Fetching ...

FLICKER: A Fine-Grained Contribution-Aware Accelerator for Real-Time 3D Gaussian Splatting

Wenhui Ou, Zhuoyu Wu, Yipu Zhang, Dongjun Wu, Freddy Ziyang Hong, Chik Patrick Yue

TL;DR

FLICKER is presented, a contribution-aware 3DGS accelerator based on hardware-software co-design that integrates adaptive leader pixels, pixel-rectangle grouping, hierarchical Gaussian testing, and a mixed-precision architecture to enable near pixel-level, contribution-driven rendering with minimal overhead.

Abstract

Recently, 3D Gaussian Splatting (3DGS) has emerged as a mainstream rendering technique due to its photorealistic quality and low latency. However, processing massive numbers of non-contributing Gaussian points introduces significant computational overhead on resource-limited edge platforms, limiting its deployment in next-generation AR/VR devices. Contribution-based prior skipping alleviates this inefficiency, yet the resulting contribution-testing workload becomes prohibitive for edge execution. In this paper, we present FLICKER, a contribution-aware 3DGS accelerator based on hardware-software co-design. The proposed framework integrates adaptive leader pixels, pixel-rectangle grouping, hierarchical Gaussian testing, and a mixed-precision architecture to enable near pixel-level, contribution-driven rendering with minimal overhead. Experimental results demonstrate up to $1.5\times$ speedup, $2.6\times$ improvement in energy efficiency, and $14%$ area reduction compared with a state-of-the-art accelerator. Compared with a representative edge GPU, FLICKER achieves a $19.8\times$ speedup and $26.7\times$ higher energy efficiency.

FLICKER: A Fine-Grained Contribution-Aware Accelerator for Real-Time 3D Gaussian Splatting

TL;DR

FLICKER is presented, a contribution-aware 3DGS accelerator based on hardware-software co-design that integrates adaptive leader pixels, pixel-rectangle grouping, hierarchical Gaussian testing, and a mixed-precision architecture to enable near pixel-level, contribution-driven rendering with minimal overhead.

Abstract

Recently, 3D Gaussian Splatting (3DGS) has emerged as a mainstream rendering technique due to its photorealistic quality and low latency. However, processing massive numbers of non-contributing Gaussian points introduces significant computational overhead on resource-limited edge platforms, limiting its deployment in next-generation AR/VR devices. Contribution-based prior skipping alleviates this inefficiency, yet the resulting contribution-testing workload becomes prohibitive for edge execution. In this paper, we present FLICKER, a contribution-aware 3DGS accelerator based on hardware-software co-design. The proposed framework integrates adaptive leader pixels, pixel-rectangle grouping, hierarchical Gaussian testing, and a mixed-precision architecture to enable near pixel-level, contribution-driven rendering with minimal overhead. Experimental results demonstrate up to speedup, improvement in energy efficiency, and area reduction compared with a state-of-the-art accelerator. Compared with a representative edge GPU, FLICKER achieves a speedup and higher energy efficiency.
Paper Structure (16 sections, 2 equations, 10 figures, 2 tables, 1 algorithm)

This paper contains 16 sections, 2 equations, 10 figures, 2 tables, 1 algorithm.

Figures (10)

  • Figure 1: (a) Rendering speed and (b) hardware utilization of the rendering kernel in vanilla 3DGS, profiled on RTX 3090 nvidia2023rtx3090 and Jetson Xavier NX nvidia2023jetsonxavier. In (b), CU denotes the utilization of compute units (i.e., GPU SMs), reflecting overall computation activity, while FP indicates the achieved FP32 performance relative to the device peak performance.
  • Figure 2: (a) Overall rendering pipeline of 3DGS kerbl20233d, and (b) comparison of three intersection methods—AABB in vanilla 3DGS, OBB in GSCore lee2024gscore and proposed Mini-Tile CAT.
  • Figure 3: Mini-Tile CAT algorithm optimization: (a) adaptive leader pixels, and (b) pixel-rectangle grouping. In (a), the PSNR of vanilla 3DGS is 25.56, while the Uniform-Dense mode shows negligible loss. Although smooth Gaussians account for 43%, the Smooth-Focused mode achieves higher PSNR, indicating that the contribution of smooth Gaussians is more significant in this case. In (b), the Mini-Tile CAT of a Pixel Rectangle (PR) can be simplified by exploiting its coordinate symmetry.
  • Figure 4: Per-pixel processed Gaussians across intersection strategies and duplicate Gaussians across tile sizes.
  • Figure 5: Overall hardware architecture of FLICKER. The key component, the contribution-aware test unit (CTU), is highlighted in purple and will be detailed in Sec. \ref{['sec:mixed precision contribution aware test unit']}.
  • ...and 5 more figures