SMERF: Streamable Memory Efficient Radiance Fields for Real-Time Large-Scene Exploration
Daniel Duckworth, Peter Hedman, Christian Reiser, Peter Zhizhin, Jean-François Thibert, Mario Lučić, Richard Szeliski, Jonathan T. Barron
TL;DR
SMERF tackles real-time, high-fidelity view synthesis for large-scale scenes under strict memory constraints by distilling a Zip-NeRF teacher into a hierarchical MERF-based student. It employs coordinate-space $K^3$ subvolumes, a deferred appearance network partitioned on a $P^3$ lattice, and feature gating to boost capacity while keeping per-frame costs low, enabling rendering in a web browser on commodity devices. A distillation training regime with appearance and geometry losses, data augmentation, and submodel-consistency regularization, combined with a distance-grid accelerated live viewer, yields PSNR gains of up to $0.78$ dB (and $1.78$ dB on large scenes) over prior real-time methods. The approach demonstrates that memory and compute can be kept effectively independent of scene size while achieving Zip-NeRF–like fidelity in real time, enabling practical large-scale, interactive exploration.
Abstract
Recent techniques for real-time view synthesis have rapidly advanced in fidelity and speed, and modern methods are capable of rendering near-photorealistic scenes at interactive frame rates. At the same time, a tension has arisen between explicit scene representations amenable to rasterization and neural fields built on ray marching, with state-of-the-art instances of the latter surpassing the former in quality while being prohibitively expensive for real-time applications. In this work, we introduce SMERF, a view synthesis approach that achieves state-of-the-art accuracy among real-time methods on large scenes with footprints up to 300 m$^2$ at a volumetric resolution of 3.5 mm$^3$. Our method is built upon two primary contributions: a hierarchical model partitioning scheme, which increases model capacity while constraining compute and memory consumption, and a distillation training strategy that simultaneously yields high fidelity and internal consistency. Our approach enables full six degrees of freedom (6DOF) navigation within a web browser and renders in real-time on commodity smartphones and laptops. Extensive experiments show that our method exceeds the current state-of-the-art in real-time novel view synthesis by 0.78 dB on standard benchmarks and 1.78 dB on large scenes, renders frames three orders of magnitude faster than state-of-the-art radiance field models, and achieves real-time performance across a wide variety of commodity devices, including smartphones. We encourage readers to explore these models interactively at our project website: https://smerf-3d.github.io.
