Cicero: Addressing Algorithmic and Architectural Bottlenecks in Neural Rendering by Radiance Warping and Memory Optimizations
Yu Feng, Zihan Liu, Jingwen Leng, Minyi Guo, Yuhao Zhu
TL;DR
Cicero addresses the dual bottlenecks of neural rendering by combining SpaRW, memory-centric DRAM access, and bank-conflict-free on-chip layouts in a co-designed SoC. SpaRW reuses radiances from nearby frames to cut up to 88% of NeRF MLP computations with minimal PSNR loss, while memory optimizations convert DRAM access into a streaming pattern and eliminate SRAM bank conflicts through a channel-major data layout. A dedicated Gathering Unit augmentation and a memory-centric pipeline enable fully-streaming NeRF rendering with modest area overhead, achieving up to 28.2× speed-up and 37.8× energy savings on mobile hardware compared to a DNN-accelerated baseline. Across synthetic and real-world datasets, Cicero demonstrates strong performance gains for local and remote VR/AR rendering with acceptable rendering quality, highlighting practical impact for real-time neural rendering on mobile platforms.
Abstract
Neural Radiance Field (NeRF) is widely seen as an alternative to traditional physically-based rendering. However, NeRF has not yet seen its adoption in resource-limited mobile systems such as Virtual and Augmented Reality (VR/AR), because it is simply extremely slow. On a mobile Volta GPU, even the state-of-the-art NeRF models generally execute only at 0.8 FPS. We show that the main performance bottlenecks are both algorithmic and architectural. We introduce, CICERO, to tame both forms of inefficiencies. We first introduce two algorithms, one fundamentally reduces the amount of work any NeRF model has to execute, and the other eliminates irregular DRAM accesses. We then describe an on-chip data layout strategy that eliminates SRAM bank conflicts. A pure software implementation of CICERO offers an 8.0x speed-up and 7.9x energy saving over a mobile Volta GPU. When compared to a baseline with a dedicated DNN accelerator, our speed-up and energy reduction increase to 28.2x and 37.8x, respectively - all with minimal quality loss (less than 1.0 dB peak signal-to-noise ratio reduction).
