NGP-RT: Fusing Multi-Level Hash Features with Lightweight Attention for Real-Time Novel View Synthesis
Yubin Hu, Xiaoyang Guo, Yang Xiao, Jingwei Huang, Yong-Jin Liu
TL;DR
NGP-RT tackles the bottleneck of real-time NeRF rendering by fusing explicit multi-level hash features with a lightweight, per-level attention mechanism, replacing per-point MLP computations. It also reduces memory access during ray marching by introducing a pre-computed occupancy distance grid to guide step sizes. The method trains coarse features with an auxiliary NGP model and then bakes both coarse features and attention parameters into grids for fast inference, achieving over 100 fps at 1080p while maintaining high-quality renderings on Mip-NeRF 360. Empirical results show strong rendering quality relative to real-time baselines and substantial speedups over Instant-NGP, with ablations validating the effectiveness of the attention design and the occupancy-distance strategy. This approach significantly advances real-time, high-fidelity NeRF rendering with practical implications for interactive VR and similar applications.
Abstract
This paper presents NGP-RT, a novel approach for enhancing the rendering speed of Instant-NGP to achieve real-time novel view synthesis. As a classic NeRF-based method, Instant-NGP stores implicit features in multi-level grids or hash tables and applies a shallow MLP to convert the implicit features into explicit colors and densities. Although it achieves fast training speed, there is still a lot of room for improvement in its rendering speed due to the per-point MLP executions for implicit multi-level feature aggregation, especially for real-time applications. To address this challenge, our proposed NGP-RT explicitly stores colors and densities as hash features, and leverages a lightweight attention mechanism to disambiguate the hash collisions instead of using computationally intensive MLP. At the rendering stage, NGP-RT incorporates a pre-computed occupancy distance grid into the ray marching strategy to inform the distance to the nearest occupied voxel, thereby reducing the number of marching points and global memory access. Experimental results show that on the challenging Mip-NeRF360 dataset, NGP-RT achieves better rendering quality than previous NeRF-based methods, achieving 108 fps at 1080p resolution on a single Nvidia RTX 3090 GPU. Our approach is promising for NeRF-based real-time applications that require efficient and high-quality rendering.
