BoostMVSNeRFs: Boosting MVS-based NeRFs to Generalizable View Synthesis in Large-scale Scenes

Chih-Hai Su; Chih-Yao Hu; Shr-Ruei Tsai; Jie-Ying Lee; Chin-Yang Lin; Yu-Lun Liu

BoostMVSNeRFs: Boosting MVS-based NeRFs to Generalizable View Synthesis in Large-scale Scenes

Chih-Hai Su, Chih-Yao Hu, Shr-Ruei Tsai, Jie-Ying Lee, Chin-Yang Lin, Yu-Lun Liu

TL;DR

BoostMVSNeRFs tackles the challenge of high-quality view synthesis in large-scale scenes by enabling multi-cost-volume fusion for MVS-based NeRFs. It introduces 3D visibility scores and 2D visibility masks to guide the fusion of multiple cost volumes during volume rendering, and employs a greedy algorithm to select an optimal support set, broadening viewport coverage without requiring additional training. The method is compatible with existing MVS-based NeRF backbones and supports end-to-end fine-tuning, achieving notable PSNR/SSIM/LPIPS gains on Free and ScanNet datasets. This approach yields more robust and scalable generalizable view synthesis in unbounded outdoor and complex indoor environments, with practical efficiency comparable to existing methods.

Abstract

While Neural Radiance Fields (NeRFs) have demonstrated exceptional quality, their protracted training duration remains a limitation. Generalizable and MVS-based NeRFs, although capable of mitigating training time, often incur tradeoffs in quality. This paper presents a novel approach called BoostMVSNeRFs to enhance the rendering quality of MVS-based NeRFs in large-scale scenes. We first identify limitations in MVS-based NeRF methods, such as restricted viewport coverage and artifacts due to limited input views. Then, we address these limitations by proposing a new method that selects and combines multiple cost volumes during volume rendering. Our method does not require training and can adapt to any MVS-based NeRF methods in a feed-forward fashion to improve rendering quality. Furthermore, our approach is also end-to-end trainable, allowing fine-tuning on specific scenes. We demonstrate the effectiveness of our method through experiments on large-scale datasets, showing significant rendering quality improvements in large-scale scenes and unbounded outdoor scenarios. We release the source code of BoostMVSNeRFs at https://su-terry.github.io/BoostMVSNeRFs/.

BoostMVSNeRFs: Boosting MVS-based NeRFs to Generalizable View Synthesis in Large-scale Scenes

TL;DR

Abstract

Paper Structure (37 sections, 13 equations, 15 figures, 9 tables, 1 algorithm)

This paper contains 37 sections, 13 equations, 15 figures, 9 tables, 1 algorithm.

Introduction
RELATED WORK
Novel view synthesis
Multi-view stereo and generalizable radiance fields
Few-shot NeRFs
Radiance fields fusion
Method
MVS-based NeRFs Preliminaries
3D Visibility Scores and 2D Visibility Masks
Rendering by Combining Multiple Cost Volumes
Support Cost Volume Set Selection
End-to-end Fine-tuning
Experiments
Experimental Settings
Datasets.
...and 22 more sections

Figures (15)

Figure 1: 3D visibility scores and 2D visibility masks. For a novel view, depth distribution is estimated from three input views, from which 3D points are sampled and projected onto each view to determine visibility. These projections yield 3D visibility scores $m_j$, normalized across the views, and are subsequently volume rendered into a 2D visibility mask $\textbf{M}^{\text{2D}}$. This mask highlights the contribution of each input view to the cost volume and guides the rendering process, aiding in the selection of input views that optimize rendering quality and field of view coverage.
Figure 2: Combined rendering from multiple cost volumes. Using a single cost volume, as in traditional MVS-based NeRFs, often introduces padding artifacts or incorrect geometry, as indicated by the red dashed circles. Our method warps selected cost volumes to the novel view's frustum and applies 3D visibility scores $m_j$ as weights to blend multiple cost volumes during volume rendering. Combined rendering provides broader viewport coverage and combines information from multiple cost volumes, leading to improved image synthesis and alleviating artifacts.
Figure 3: Support cost volume set selection. Initially, our greedy algorithm selects a single cost volume, providing maximum coverage yet insufficient to prevent padding artifacts (orange boxes). Subsequent iterations incorporate additional cost volumes, progressively expanding view coverage, and improving image quality, as indicated by the increasing PSNR values.
Figure 4: Qualitative comparisons of rendering quality on the Free wang2023f2 dataset.
Figure 5: Qualitative rendering quality improvements of integrating our method into MVS-based NeRF methods on the Free dataset.
...and 10 more figures

BoostMVSNeRFs: Boosting MVS-based NeRFs to Generalizable View Synthesis in Large-scale Scenes

TL;DR

Abstract

BoostMVSNeRFs: Boosting MVS-based NeRFs to Generalizable View Synthesis in Large-scale Scenes

Authors

TL;DR

Abstract

Table of Contents

Figures (15)