Table of Contents
Fetching ...

RT-NeRF: Real-Time On-Device Neural Radiance Fields Towards Immersive AR/VR Rendering

Chaojian Li, Sixu Li, Yang Zhao, Wenbo Zhu, Yingyan Celine Lin

TL;DR

RT-NeRF addresses the barrier to real-time on-device NeRF rendering for AR/VR by identifying two core bottlenecks in existing efficient NeRF methods: uniform point sampling and dense embeddings. It advances a two-pronged solution: an algorithmic RT-NeRF that directly computes geometry from non-zero occupancy-grid cubes and employs a coarse view-dependent ordering to skip invisible points, and a dedicated hardware accelerator with a hybrid sparse encoding scheme and specialized decoding units to exploit sparsity. The approach yields massive throughput gains (up to 3,201×) while preserving rendering quality, and achieves substantial energy efficiency improvements on both edge and cloud hardware. This work demonstrates a viable algorithm–hardware co-design path for real-time NeRF, enabling immersive on-device AR/VR experiences and setting a foundation for future sparsity-aware accelerators.

Abstract

Neural Radiance Field (NeRF) based rendering has attracted growing attention thanks to its state-of-the-art (SOTA) rendering quality and wide applications in Augmented and Virtual Reality (AR/VR). However, immersive real-time (> 30 FPS) NeRF based rendering enabled interactions are still limited due to the low achievable throughput on AR/VR devices. To this end, we first profile SOTA efficient NeRF algorithms on commercial devices and identify two primary causes of the aforementioned inefficiency: (1) the uniform point sampling and (2) the dense accesses and computations of the required embeddings in NeRF. Furthermore, we propose RT-NeRF, which to the best of our knowledge is the first algorithm-hardware co-design acceleration of NeRF. Specifically, on the algorithm level, RT-NeRF integrates an efficient rendering pipeline for largely alleviating the inefficiency due to the commonly adopted uniform point sampling method in NeRF by directly computing the geometry of pre-existing points. Additionally, RT-NeRF leverages a coarse-grained view-dependent computing ordering scheme for eliminating the (unnecessary) processing of invisible points. On the hardware level, our proposed RT-NeRF accelerator (1) adopts a hybrid encoding scheme to adaptively switch between a bitmap- or coordinate-based sparsity encoding format for NeRF's sparse embeddings, aiming to maximize the storage savings and thus reduce the required DRAM accesses while supporting efficient NeRF decoding; and (2) integrates both a dual-purpose bi-direction adder & search tree and a high-density sparse search unit to coordinate the two aforementioned encoding formats. Extensive experiments on eight datasets consistently validate the effectiveness of RT-NeRF, achieving a large throughput improvement (e.g., 9.7x - 3,201x) while maintaining the rendering quality as compared with SOTA efficient NeRF solutions.

RT-NeRF: Real-Time On-Device Neural Radiance Fields Towards Immersive AR/VR Rendering

TL;DR

RT-NeRF addresses the barrier to real-time on-device NeRF rendering for AR/VR by identifying two core bottlenecks in existing efficient NeRF methods: uniform point sampling and dense embeddings. It advances a two-pronged solution: an algorithmic RT-NeRF that directly computes geometry from non-zero occupancy-grid cubes and employs a coarse view-dependent ordering to skip invisible points, and a dedicated hardware accelerator with a hybrid sparse encoding scheme and specialized decoding units to exploit sparsity. The approach yields massive throughput gains (up to 3,201×) while preserving rendering quality, and achieves substantial energy efficiency improvements on both edge and cloud hardware. This work demonstrates a viable algorithm–hardware co-design path for real-time NeRF, enabling immersive on-device AR/VR experiences and setting a foundation for future sparsity-aware accelerators.

Abstract

Neural Radiance Field (NeRF) based rendering has attracted growing attention thanks to its state-of-the-art (SOTA) rendering quality and wide applications in Augmented and Virtual Reality (AR/VR). However, immersive real-time (> 30 FPS) NeRF based rendering enabled interactions are still limited due to the low achievable throughput on AR/VR devices. To this end, we first profile SOTA efficient NeRF algorithms on commercial devices and identify two primary causes of the aforementioned inefficiency: (1) the uniform point sampling and (2) the dense accesses and computations of the required embeddings in NeRF. Furthermore, we propose RT-NeRF, which to the best of our knowledge is the first algorithm-hardware co-design acceleration of NeRF. Specifically, on the algorithm level, RT-NeRF integrates an efficient rendering pipeline for largely alleviating the inefficiency due to the commonly adopted uniform point sampling method in NeRF by directly computing the geometry of pre-existing points. Additionally, RT-NeRF leverages a coarse-grained view-dependent computing ordering scheme for eliminating the (unnecessary) processing of invisible points. On the hardware level, our proposed RT-NeRF accelerator (1) adopts a hybrid encoding scheme to adaptively switch between a bitmap- or coordinate-based sparsity encoding format for NeRF's sparse embeddings, aiming to maximize the storage savings and thus reduce the required DRAM accesses while supporting efficient NeRF decoding; and (2) integrates both a dual-purpose bi-direction adder & search tree and a high-density sparse search unit to coordinate the two aforementioned encoding formats. Extensive experiments on eight datasets consistently validate the effectiveness of RT-NeRF, achieving a large throughput improvement (e.g., 9.7x - 3,201x) while maintaining the rendering quality as compared with SOTA efficient NeRF solutions.
Paper Structure (20 sections, 2 equations, 14 figures, 2 tables)

This paper contains 20 sections, 2 equations, 14 figures, 2 tables.

Figures (14)

  • Figure 1: An illustration of novel view synthesis, which is the rendering task that NeRF mildenhall2020nerf targets to resolve.
  • Figure 2: NeRF mildenhall2020nerf based rendering includes Step ❶ Map pixels to rays $\mathbf{r} = \mathbf{o}+t\mathbf{d}$ by marching camera rays through the scene, Step ❷ Query the features (i.e., the RGB color and the density $\sigma$) of points along the rays by inputting their locations and distance to an MLP model, and Step ❸ Render pixels' colors.
  • Figure 3: TensoRF chen2022tensorf achieving SOTA NeRF efficiency replaces Step ❷ (i.e., query the features of points along the rays using a MLP) in NeRF mildenhall2020nerf with both Step ❷-①, which locates pre-existing points using an occupancy grid, and Step ❷-②, which computes pre-existing points' features based on a decomposed embedding grid in terms of matrix-vector pairs.
  • Figure 4: Runtime breakdown across eight datasets on three representative commercial devices, which shows that among Step ❶ (i.e., map pixels to rays), Step ❷-① (i.e., locate the pre-existing points), Step ❷-② (i.e., compute pre-existing points' features), and Step ❸ (i.e., render pixels' colors), the SOTA efficient NeRF solution chen2022tensorf is bottlenecked by Step ❷-① and Step ❷-②, the latter of which includes Step ❷-②-Embedding-Grid and Step ❷-②-MLP that correspond to the operations in Eq. \ref{['eq:tensorf_decomp']} and for the MLP inference, respectively).
  • Figure 5: The sparsity of different weights in Eq. \ref{['eq:tensorf_decomp']} on the Drums, Hotdog, Lego, and Mic datasets, where Density - $\mathbf{M}^{X,Y}$ represents the matrices in the $X$, $Y$ plane for densities and the notations of the other weights can be interpreted in the same way.
  • ...and 9 more figures