Potamoi: Accelerating Neural Rendering via a Unified Streaming Architecture
Yu Feng, Weikai Lin, Zihan Liu, Jingwen Leng, Minyi Guo, Han Zhao, Xiaofeng Hou, Jieru Zhao, Yuhao Zhu
TL;DR
Potamoi addresses the bottlenecks of real-time NeRF rendering on resource-constrained devices by introducing SpaRW, a plug-and-play radiance-warping technique, and a fully-streaming memory-centric dataflow augmented by a Gathering Unit hardware block. The approach unifies diverse NeRF algorithms under a single streaming framework, reducing DRAM accesses and eliminating SRAM bank conflicts, while a proactive runtime overlaps reference-frame renderings with target-frame renderings and optionally leverages remote servers. Empirically, Potamoi achieves up to $53.1\times$ speedup and $67.7\times$ energy savings with PSNR degradation under $1.0$ dB across multiple NeRF methods, outperforming baselines with dedicated DNN accelerators. This co-design delivers substantial practical impact by enabling high-quality, real-time neural rendering on mobile and embedded platforms, and it generalizes across varied NeRF representations and encodings.
Abstract
Neural Radiance Field (NeRF) has emerged as a promising alternative for photorealistic rendering. Despite recent algorithmic advancements, achieving real-time performance on today's resource-constrained devices remains challenging. In this paper, we identify the primary bottlenecks in current NeRF algorithms and introduce a unified algorithm-architecture co-design, Potamoi, designed to accommodate various NeRF algorithms. Specifically, we introduce a runtime system featuring a plug-and-play algorithm, SpaRW, which significantly reduces the per-frame computational workload and alleviates compute inefficiencies. Furthermore, our unified streaming pipeline coupled with customized hardware support effectively tames both SRAM and DRAM inefficiencies by minimizing repetitive DRAM access and completely eliminating SRAM bank conflicts. When evaluated against a baseline utilizing a dedicated DNN accelerator, our framework demonstrates a speed-up and energy reduction of 53.1$\times$ and 67.7$\times$, respectively, all while maintaining high visual quality with less than a 1.0 dB reduction in peak signal-to-noise ratio.
