MuRF: Multi-Baseline Radiance Fields
Haofei Xu, Anpei Chen, Yuedong Chen, Christos Sakaridis, Yulun Zhang, Marc Pollefeys, Andreas Geiger, Fisher Yu
TL;DR
MuRF addresses sparse-view novel view synthesis across both small and large camera baselines by introducing a target-view frustum volume, which is aligned with the target image to effectively aggregate information from multiple input views. A multi-view feature encoder generates robust representations, while a (2+1)D CNN-based radiance field decoder regresses a full radiance field from a low-resolution volume, aided by hierarchical volume sampling for efficiency. The approach achieves state-of-the-art results on diverse datasets (e.g., DTU, RealEstate10K, LLFF) and exhibits promising zero-shot generalization on Mip-NeRF 360, demonstrating strong generalization across baselines without per-scene optimization. Overall, MuRF provides a geometry-aware, feed-forward solution that preserves sharp scene structures and scales to high-resolution rendering, with broad applicability to object-centric and scene-scale scenarios.
Abstract
We present Multi-Baseline Radiance Fields (MuRF), a general feed-forward approach to solving sparse view synthesis under multiple different baseline settings (small and large baselines, and different number of input views). To render a target novel view, we discretize the 3D space into planes parallel to the target image plane, and accordingly construct a target view frustum volume. Such a target volume representation is spatially aligned with the target view, which effectively aggregates relevant information from the input views for high-quality rendering. It also facilitates subsequent radiance field regression with a convolutional network thanks to its axis-aligned nature. The 3D context modeled by the convolutional network enables our method to synthesis sharper scene structures than prior works. Our MuRF achieves state-of-the-art performance across multiple different baseline settings and diverse scenarios ranging from simple objects (DTU) to complex indoor and outdoor scenes (RealEstate10K and LLFF). We also show promising zero-shot generalization abilities on the Mip-NeRF 360 dataset, demonstrating the general applicability of MuRF.
