NGP-RT: Fusing Multi-Level Hash Features with Lightweight Attention for Real-Time Novel View Synthesis

Yubin Hu; Xiaoyang Guo; Yang Xiao; Jingwei Huang; Yong-Jin Liu

NGP-RT: Fusing Multi-Level Hash Features with Lightweight Attention for Real-Time Novel View Synthesis

Yubin Hu, Xiaoyang Guo, Yang Xiao, Jingwei Huang, Yong-Jin Liu

TL;DR

NGP-RT tackles the bottleneck of real-time NeRF rendering by fusing explicit multi-level hash features with a lightweight, per-level attention mechanism, replacing per-point MLP computations. It also reduces memory access during ray marching by introducing a pre-computed occupancy distance grid to guide step sizes. The method trains coarse features with an auxiliary NGP model and then bakes both coarse features and attention parameters into grids for fast inference, achieving over 100 fps at 1080p while maintaining high-quality renderings on Mip-NeRF 360. Empirical results show strong rendering quality relative to real-time baselines and substantial speedups over Instant-NGP, with ablations validating the effectiveness of the attention design and the occupancy-distance strategy. This approach significantly advances real-time, high-fidelity NeRF rendering with practical implications for interactive VR and similar applications.

Abstract

This paper presents NGP-RT, a novel approach for enhancing the rendering speed of Instant-NGP to achieve real-time novel view synthesis. As a classic NeRF-based method, Instant-NGP stores implicit features in multi-level grids or hash tables and applies a shallow MLP to convert the implicit features into explicit colors and densities. Although it achieves fast training speed, there is still a lot of room for improvement in its rendering speed due to the per-point MLP executions for implicit multi-level feature aggregation, especially for real-time applications. To address this challenge, our proposed NGP-RT explicitly stores colors and densities as hash features, and leverages a lightweight attention mechanism to disambiguate the hash collisions instead of using computationally intensive MLP. At the rendering stage, NGP-RT incorporates a pre-computed occupancy distance grid into the ray marching strategy to inform the distance to the nearest occupied voxel, thereby reducing the number of marching points and global memory access. Experimental results show that on the challenging Mip-NeRF360 dataset, NGP-RT achieves better rendering quality than previous NeRF-based methods, achieving 108 fps at 1080p resolution on a single Nvidia RTX 3090 GPU. Our approach is promising for NeRF-based real-time applications that require efficient and high-quality rendering.

NGP-RT: Fusing Multi-Level Hash Features with Lightweight Attention for Real-Time Novel View Synthesis

TL;DR

Abstract

Paper Structure (21 sections, 12 equations, 11 figures, 8 tables)

This paper contains 21 sections, 12 equations, 11 figures, 8 tables.

Introduction
Related Work
Preliminaries
Method
NGP-RT Overview
Feature Fusion with Lightweight Attention
Ray Marching with Occupancy Distance
Experiments
Comparisons
Ablation Study
Conclusion
Per-scene Results
Implementation Details
Network Design
Loss Functions
...and 6 more sections

Figures (11)

Figure 1: Comparisons of the feature construction methods in SNERG snerg, MERF reiser2023merf, and our NGP-RT. We utilize a lightweight attention mechanism to efficiently aggregate the multi-level explicit hash features, which sufficiently exploit the expressing power of NGP-style features under the deferred NeRF architecture for fast rendering.
Figure 2: 2D illustration of the feature construction pipeline of NGP-RT. Following the practice in MERF reiser2023merf, NGP-RT constructs the deferred NeRF feature ${\bf f}$ with a coarse-grained part $\Tilde{\bf f}$ and a fine-grained part $\hat{\bf f}$. At the training stage, we optimize $\Tilde{\bf f}$ and the attention parameters ${\bf a}$ with an auxiliary NGP model. At the inference stage, we bake them into the low-resolution grids $\Tilde{\mathcal{F}}$ and $\mathcal{A}$ for fast access and real-time rendering. The fine-grained features $\hat{\bf f}$ is fused from the high-resolution hash features fused by lightweight attention mechanism. NGP-RT employs the rendering process of deferred NeRF snerg for volume rendering. We omit most of the superscript $i$ for simplicity.
Figure 3: Comparison between (a) the previous ray marching strategy with only multi-level occupancy grid $\mathcal{O}$ and (b) our strategy with $\mathcal{O}$ and occupancy distance grid $\mathcal{G}$.
Figure 4: We show comparisons of NGP-RT to previous methods and the ground truth images from several scenes in the Mip-NeRF 360 dataset. NGP-RT avoids inaccurate floaters and presents better light effects in its renderings.
Figure 5: Rendering results from NGP-RT with different fine-grained hash feature levels.
...and 6 more figures

NGP-RT: Fusing Multi-Level Hash Features with Lightweight Attention for Real-Time Novel View Synthesis

TL;DR

Abstract

NGP-RT: Fusing Multi-Level Hash Features with Lightweight Attention for Real-Time Novel View Synthesis

Authors

TL;DR

Abstract

Table of Contents

Figures (11)