CMC: Few-shot Novel View Synthesis via Cross-view Multiplane Consistency
Hanxin Zhu, Tianyu He, Zhibo Chen
TL;DR
This work tackles the problem of NeRF-based few-shot novel view synthesis, where scant input views cause overfitting and poor depth estimation. It introduces Cross-view Multiplane Consistency (CMC), which builds per-view Multiplane Images (MPI) and enforces depth-aware consistency by sharing sampling points across views, supplemented by reconstruction loss on seen views and appearance/depth losses on unseen views. The approach, including per-view MPI, weighted rendering, and cross-view losses, achieves state-of-the-art results on LLFF and Shiny datasets without requiring scene priors or complex priors, improving both visual quality and geometry continuity. By enabling robust, cross-view geometry learning in sparse-view regimes, CMC offers practical gains for real-world view synthesis in VR/AR and related applications.
Abstract
Neural Radiance Field (NeRF) has shown impressive results in novel view synthesis, particularly in Virtual Reality (VR) and Augmented Reality (AR), thanks to its ability to represent scenes continuously. However, when just a few input view images are available, NeRF tends to overfit the given views and thus make the estimated depths of pixels share almost the same value. Unlike previous methods that conduct regularization by introducing complex priors or additional supervisions, we propose a simple yet effective method that explicitly builds depth-aware consistency across input views to tackle this challenge. Our key insight is that by forcing the same spatial points to be sampled repeatedly in different input views, we are able to strengthen the interactions between views and therefore alleviate the overfitting problem. To achieve this, we build the neural networks on layered representations (\textit{i.e.}, multiplane images), and the sampling point can thus be resampled on multiple discrete planes. Furthermore, to regularize the unseen target views, we constrain the rendered colors and depths from different input views to be the same. Although simple, extensive experiments demonstrate that our proposed method can achieve better synthesis quality over state-of-the-art methods.
