MixRT: Mixed Neural Representations For Real-Time NeRF Rendering
Chaojian Li, Bichen Wu, Peter Vajda, Yingyan Celine Lin
TL;DR
MixRT introduces a mixed neural representation for real-time NeRF rendering by combining a low-quality mesh, a view-dependent displacement map, and a compressed NeRF in a hash-table. This design leverages rasterizers, texture units, and SIMD on common hardware and WebGL, mapping ray-mesh intersections through a SH-based calibration before querying a hash-table Embedding-to-color pipeline. Empirical results on Unbounded-360 indoor scenes show MixRT achieving real-time >30 FPS at 1280×720 with PSNR improvements (~0.2 dB) and reduced storage (~80% of SOTA) compared with prior real-time methods. The approach demonstrates that high geometric complexity is not strictly necessary for photorealistic rendering, offering a practical route to edge-device NeRF applications.
Abstract
Neural Radiance Field (NeRF) has emerged as a leading technique for novel view synthesis, owing to its impressive photorealistic reconstruction and rendering capability. Nevertheless, achieving real-time NeRF rendering in large-scale scenes has presented challenges, often leading to the adoption of either intricate baked mesh representations with a substantial number of triangles or resource-intensive ray marching in baked representations. We challenge these conventions, observing that high-quality geometry, represented by meshes with substantial triangles, is not necessary for achieving photorealistic rendering quality. Consequently, we propose MixRT, a novel NeRF representation that includes a low-quality mesh, a view-dependent displacement map, and a compressed NeRF model. This design effectively harnesses the capabilities of existing graphics hardware, thus enabling real-time NeRF rendering on edge devices. Leveraging a highly-optimized WebGL-based rendering framework, our proposed MixRT attains real-time rendering speeds on edge devices (over 30 FPS at a resolution of 1280 x 720 on a MacBook M1 Pro laptop), better rendering quality (0.2 PSNR higher in indoor scenes of the Unbounded-360 datasets), and a smaller storage size (less than 80% compared to state-of-the-art methods).
