Potamoi: Accelerating Neural Rendering via a Unified Streaming Architecture

Yu Feng; Weikai Lin; Zihan Liu; Jingwen Leng; Minyi Guo; Han Zhao; Xiaofeng Hou; Jieru Zhao; Yuhao Zhu

Potamoi: Accelerating Neural Rendering via a Unified Streaming Architecture

Yu Feng, Weikai Lin, Zihan Liu, Jingwen Leng, Minyi Guo, Han Zhao, Xiaofeng Hou, Jieru Zhao, Yuhao Zhu

TL;DR

Potamoi addresses the bottlenecks of real-time NeRF rendering on resource-constrained devices by introducing SpaRW, a plug-and-play radiance-warping technique, and a fully-streaming memory-centric dataflow augmented by a Gathering Unit hardware block. The approach unifies diverse NeRF algorithms under a single streaming framework, reducing DRAM accesses and eliminating SRAM bank conflicts, while a proactive runtime overlaps reference-frame renderings with target-frame renderings and optionally leverages remote servers. Empirically, Potamoi achieves up to $53.1\times$ speedup and $67.7\times$ energy savings with PSNR degradation under $1.0$ dB across multiple NeRF methods, outperforming baselines with dedicated DNN accelerators. This co-design delivers substantial practical impact by enabling high-quality, real-time neural rendering on mobile and embedded platforms, and it generalizes across varied NeRF representations and encodings.

Abstract

Neural Radiance Field (NeRF) has emerged as a promising alternative for photorealistic rendering. Despite recent algorithmic advancements, achieving real-time performance on today's resource-constrained devices remains challenging. In this paper, we identify the primary bottlenecks in current NeRF algorithms and introduce a unified algorithm-architecture co-design, Potamoi, designed to accommodate various NeRF algorithms. Specifically, we introduce a runtime system featuring a plug-and-play algorithm, SpaRW, which significantly reduces the per-frame computational workload and alleviates compute inefficiencies. Furthermore, our unified streaming pipeline coupled with customized hardware support effectively tames both SRAM and DRAM inefficiencies by minimizing repetitive DRAM access and completely eliminating SRAM bank conflicts. When evaluated against a baseline utilizing a dedicated DNN accelerator, our framework demonstrates a speed-up and energy reduction of 53.1$\times$ and 67.7$\times$, respectively, all while maintaining high visual quality with less than a 1.0 dB reduction in peak signal-to-noise ratio.

Potamoi: Accelerating Neural Rendering via a Unified Streaming Architecture

TL;DR

speedup and

energy savings with PSNR degradation under

dB across multiple NeRF methods, outperforming baselines with dedicated DNN accelerators. This co-design delivers substantial practical impact by enabling high-quality, real-time neural rendering on mobile and embedded platforms, and it generalizes across varied NeRF representations and encodings.

Abstract

and 67.7

, respectively, all while maintaining high visual quality with less than a 1.0 dB reduction in peak signal-to-noise ratio.

Paper Structure (68 sections, 4 equations, 27 figures)

This paper contains 68 sections, 4 equations, 27 figures.

Introduction
Bottleneck Analysis.
Algorithmic Support.
Run-time Support.
Dataflow Optimization.
Hardware Augmentation.
Background
NeRF Fundamentals
General NeRF Pipeline
Indexing (${\mathcal{I}}$).
Feature Gathering (${\mathcal{G}}$).
Feature Computation (${\mathcal{F}}$).
Motivation
Computation Inefficiencies
Performance and Model Size.
...and 53 more sections

Figures (27)

Figure 1: The rendering pipeline of today's NeRFs consists of three stages: Indexing (${\mathcal{I}}$), Feature Gathering (${\mathcal{G}}$), and Feature Computation (${\mathcal{F}}$). For NeRFs with structured representations, each ray first samples points ($S_{1}$, $S_{2}$, and $S_{3}$) along the ray direction. Each ray sample gathers and interpolates 3D features from eight vertices of the intersected voxel ($V_3$, and $V_{32}$, and $V_{81}$) to obtain features ($F_1$, $F_2$, and $F_3$), as highlighted in purple. For NeRFs with unstructured representations, each ray directly intersects with Gaussian points ($P_3$, $P_{13}$, and $P_{111}$) to obtain features, as highlighted in green. Then, the features are fed into the GEMM-based computation to get the partial pixel values, as highlighted in orange. The final pixel value is summed from all partial pixel values mildenhall2021nerf.
Figure 2: Frame rate vs. model size on the Xavier SoC xaviersoc across state-of-the-art NeRF algorithmsmuller2022instantsun2022directchen2022tensorfhedman2021bakingchen2023mobilenerfkerbl20233d.
Figure 3: Normalized execution breakdown across state-of-the-art NeRF algorithms muller2022instantchen2022tensorfhu2022efficientnerfsun2022directkerbl20233dlee2023compact.
Figure 4: Percentage of non-continuous DRAM accesses in Feature Gathering ${\mathcal{G}}$.
Figure 5: On-chip memory miss rate in Feature Gathering (${\mathcal{G}}$) across NeRF algorithms.
...and 22 more figures

Potamoi: Accelerating Neural Rendering via a Unified Streaming Architecture

TL;DR

Abstract

Potamoi: Accelerating Neural Rendering via a Unified Streaming Architecture

Authors

TL;DR

Abstract

Table of Contents

Figures (27)