Table of Contents
Fetching ...

CMC: Few-shot Novel View Synthesis via Cross-view Multiplane Consistency

Hanxin Zhu, Tianyu He, Zhibo Chen

TL;DR

This work tackles the problem of NeRF-based few-shot novel view synthesis, where scant input views cause overfitting and poor depth estimation. It introduces Cross-view Multiplane Consistency (CMC), which builds per-view Multiplane Images (MPI) and enforces depth-aware consistency by sharing sampling points across views, supplemented by reconstruction loss on seen views and appearance/depth losses on unseen views. The approach, including per-view MPI, weighted rendering, and cross-view losses, achieves state-of-the-art results on LLFF and Shiny datasets without requiring scene priors or complex priors, improving both visual quality and geometry continuity. By enabling robust, cross-view geometry learning in sparse-view regimes, CMC offers practical gains for real-world view synthesis in VR/AR and related applications.

Abstract

Neural Radiance Field (NeRF) has shown impressive results in novel view synthesis, particularly in Virtual Reality (VR) and Augmented Reality (AR), thanks to its ability to represent scenes continuously. However, when just a few input view images are available, NeRF tends to overfit the given views and thus make the estimated depths of pixels share almost the same value. Unlike previous methods that conduct regularization by introducing complex priors or additional supervisions, we propose a simple yet effective method that explicitly builds depth-aware consistency across input views to tackle this challenge. Our key insight is that by forcing the same spatial points to be sampled repeatedly in different input views, we are able to strengthen the interactions between views and therefore alleviate the overfitting problem. To achieve this, we build the neural networks on layered representations (\textit{i.e.}, multiplane images), and the sampling point can thus be resampled on multiple discrete planes. Furthermore, to regularize the unseen target views, we constrain the rendered colors and depths from different input views to be the same. Although simple, extensive experiments demonstrate that our proposed method can achieve better synthesis quality over state-of-the-art methods.

CMC: Few-shot Novel View Synthesis via Cross-view Multiplane Consistency

TL;DR

This work tackles the problem of NeRF-based few-shot novel view synthesis, where scant input views cause overfitting and poor depth estimation. It introduces Cross-view Multiplane Consistency (CMC), which builds per-view Multiplane Images (MPI) and enforces depth-aware consistency by sharing sampling points across views, supplemented by reconstruction loss on seen views and appearance/depth losses on unseen views. The approach, including per-view MPI, weighted rendering, and cross-view losses, achieves state-of-the-art results on LLFF and Shiny datasets without requiring scene priors or complex priors, improving both visual quality and geometry continuity. By enabling robust, cross-view geometry learning in sparse-view regimes, CMC offers practical gains for real-world view synthesis in VR/AR and related applications.

Abstract

Neural Radiance Field (NeRF) has shown impressive results in novel view synthesis, particularly in Virtual Reality (VR) and Augmented Reality (AR), thanks to its ability to represent scenes continuously. However, when just a few input view images are available, NeRF tends to overfit the given views and thus make the estimated depths of pixels share almost the same value. Unlike previous methods that conduct regularization by introducing complex priors or additional supervisions, we propose a simple yet effective method that explicitly builds depth-aware consistency across input views to tackle this challenge. Our key insight is that by forcing the same spatial points to be sampled repeatedly in different input views, we are able to strengthen the interactions between views and therefore alleviate the overfitting problem. To achieve this, we build the neural networks on layered representations (\textit{i.e.}, multiplane images), and the sampling point can thus be resampled on multiple discrete planes. Furthermore, to regularize the unseen target views, we constrain the rendered colors and depths from different input views to be the same. Although simple, extensive experiments demonstrate that our proposed method can achieve better synthesis quality over state-of-the-art methods.
Paper Structure (33 sections, 24 equations, 4 figures, 3 tables)

This paper contains 33 sections, 24 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Given a few input views (e.g., 3 input views), (a) NeRF tends to overfit to input views and results in a dramatic performance drop, where the estimated depths of pixels share almost the same value. (b) Our key insight is to ensure the same spatial points can be sampled repeatedly in different input views. (c) Our proposed method can achieve smooth depth estimation by introducing cross-view multiplane consistency, resulting in better synthesis quality.
  • Figure 2: Qualitative comparisons on the Shiny dataset, where our proposed method can achieve better novel view synthesis and accurate geometry estimation (i.e., the depth map).
  • Figure 3: Qualitative comparisons on the LLFF dataset. Our proposed method can avoid the overfitting problem, where better novel view synthesis and more continuous depth estimation can be achieved.
  • Figure 4: Qualitative comparisons of different choices of loss functions. (1) Single MPI with $\mathcal{L}_{\text{MSE}}$. (2) Per-view MPI with $\mathcal{L}_{\text{MSE}}$. (3) Per-view MPI with $\mathcal{L}_{\text{MSE}}+\mathcal{L}_{\text{dc}}^{\text{I}}$. (4) Per-view MPI with $\mathcal{L}_{\text{MSE}}+\mathcal{L}_{\text{dc}}^{\text{I}}+\mathcal{L}_{\text{ac}}$. (5) Per-view MPI with $\mathcal{L}_{\text{MSE}}+\mathcal{L}_{\text{dc}}^{\text{I}}+\mathcal{L}_{\text{ac}}+\mathcal{L}_{\text{dc}}$.