Table of Contents
Fetching ...

MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo

Anpei Chen, Zexiang Xu, Fuqiang Zhao, Xiaoshuai Zhang, Fanbo Xiang, Jingyi Yu, Hao Su

TL;DR

MVSNeRF addresses the challenge of fast, generalizable view synthesis by reconstructing neural radiance fields from three nearby views. It combines plane-swept cost volumes from multi-view stereo with physically based volume rendering to produce geometry-aware radiance fields, trained on the DTU dataset and tested across three datasets to demonstrate cross-scene generalization, including indoor scenes. The method enables rapid per-scene reconstruction, with the option to fine-tune when dense imagery is available, yielding higher rendering quality and substantially reduced optimization time compared to NeRF. This work provides a practical, scalable approach to generalizable neural rendering that extends to unseen scenes and diverse environments while maintaining high-quality view synthesis.

Abstract

We present MVSNeRF, a novel neural rendering approach that can efficiently reconstruct neural radiance fields for view synthesis. Unlike prior works on neural radiance fields that consider per-scene optimization on densely captured images, we propose a generic deep neural network that can reconstruct radiance fields from only three nearby input views via fast network inference. Our approach leverages plane-swept cost volumes (widely used in multi-view stereo) for geometry-aware scene reasoning, and combines this with physically based volume rendering for neural radiance field reconstruction. We train our network on real objects in the DTU dataset, and test it on three different datasets to evaluate its effectiveness and generalizability. Our approach can generalize across scenes (even indoor scenes, completely different from our training scenes of objects) and generate realistic view synthesis results using only three input images, significantly outperforming concurrent works on generalizable radiance field reconstruction. Moreover, if dense images are captured, our estimated radiance field representation can be easily fine-tuned; this leads to fast per-scene reconstruction with higher rendering quality and substantially less optimization time than NeRF.

MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo

TL;DR

MVSNeRF addresses the challenge of fast, generalizable view synthesis by reconstructing neural radiance fields from three nearby views. It combines plane-swept cost volumes from multi-view stereo with physically based volume rendering to produce geometry-aware radiance fields, trained on the DTU dataset and tested across three datasets to demonstrate cross-scene generalization, including indoor scenes. The method enables rapid per-scene reconstruction, with the option to fine-tune when dense imagery is available, yielding higher rendering quality and substantially reduced optimization time compared to NeRF. This work provides a practical, scalable approach to generalizable neural rendering that extends to unseen scenes and diverse environments while maintaining high-quality view synthesis.

Abstract

We present MVSNeRF, a novel neural rendering approach that can efficiently reconstruct neural radiance fields for view synthesis. Unlike prior works on neural radiance fields that consider per-scene optimization on densely captured images, we propose a generic deep neural network that can reconstruct radiance fields from only three nearby input views via fast network inference. Our approach leverages plane-swept cost volumes (widely used in multi-view stereo) for geometry-aware scene reasoning, and combines this with physically based volume rendering for neural radiance field reconstruction. We train our network on real objects in the DTU dataset, and test it on three different datasets to evaluate its effectiveness and generalizability. Our approach can generalize across scenes (even indoor scenes, completely different from our training scenes of objects) and generate realistic view synthesis results using only three input images, significantly outperforming concurrent works on generalizable radiance field reconstruction. Moreover, if dense images are captured, our estimated radiance field representation can be easily fine-tuned; this leads to fast per-scene reconstruction with higher rendering quality and substantially less optimization time than NeRF.

Paper Structure

This paper contains 16 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Example of caption. It is set in Roman so that mathematics (always set in Roman: $B \sin A = A \sin B$) may be included without an ugly clash.
  • Figure 2: Example of a short caption, which should be centered.