Multiplane Prior Guided Few-Shot Aerial Scene Rendering

Zihan Gao; Licheng Jiao; Lingling Li; Xu Liu; Fang Liu; Puhua Chen; Yuwei Guo

Multiplane Prior Guided Few-Shot Aerial Scene Rendering

Zihan Gao, Licheng Jiao, Lingling Li, Xu Liu, Fang Liu, Puhua Chen, Yuwei Guo

TL;DR

MPNeRF tackles the challenge of few-shot aerial scene rendering by coupling NeRF with a multiplane prior from MPI, enhanced by a SwinV2 transformer pre-trained with SimMIM. The method jointly trains a NeRF branch and an MPI branch, where the MPI output provides pseudo-labels via a multiplane loss to guide NeRF under sparse viewpoints. Across LEVIR-NVS, MPNeRF consistently outperforms dense-view and few-shot baselines in PSNR, SSIM, and LPIPS, and ablations confirm the importance of the MPI prior, multi-scale features, and pretraining. This approach offers a practical, data-efficient path for NeRF-based aerial applications with limited data collection opportunities.

Abstract

Neural Radiance Fields (NeRF) have been successfully applied in various aerial scenes, yet they face challenges with sparse views due to limited supervision. The acquisition of dense aerial views is often prohibitive, as unmanned aerial vehicles (UAVs) may encounter constraints in perspective range and energy constraints. In this work, we introduce Multiplane Prior guided NeRF (MPNeRF), a novel approach tailored for few-shot aerial scene rendering-marking a pioneering effort in this domain. Our key insight is that the intrinsic geometric regularities specific to aerial imagery could be leveraged to enhance NeRF in sparse aerial scenes. By investigating NeRF's and Multiplane Image (MPI)'s behavior, we propose to guide the training process of NeRF with a Multiplane Prior. The proposed Multiplane Prior draws upon MPI's benefits and incorporates advanced image comprehension through a SwinV2 Transformer, pre-trained via SimMIM. Our extensive experiments demonstrate that MPNeRF outperforms existing state-of-the-art methods applied in non-aerial contexts, by tripling the performance in SSIM and LPIPS even with three views available. We hope our work offers insights into the development of NeRF-based applications in aerial scenes with limited data.

Multiplane Prior Guided Few-Shot Aerial Scene Rendering

TL;DR

Abstract

Paper Structure (19 sections, 9 equations, 8 figures, 7 tables)

This paper contains 19 sections, 9 equations, 8 figures, 7 tables.

Introduction
Related Work
Scene Representations for View Synthesis.
NeRF with Sparse Input
Method
Preliminaries
A Closer Look at The Behavior of NeRF & MPI
Guiding NeRF with a Multiplane Prior
Experiment
Implementation Details
Datasets and Evaluation Metrics
Baseline Methods
Comparative Results Analysis
Ablation Studies and Further Analyses
Limitations and Conclusion
...and 4 more sections

Figures (8)

Figure 1: Visualization of failure modes in NeRF and MPI.(a) MPI models scenes only in each single camera frustums and performs homography warping to render novel views. Insufficient sampling leads to incorrect depth and thus results in an overlapping ghosting effect. Large camera movement leads to cropped corners. However, high-frequency details seem to be successfully preserved. (b) NeRF models scenes in a continuous volumetric manner. If only sparse views with large camera movement are provided, some parts of the scene may be sampled very little or even never. Insufficient sampling leads to collapsed details and unexpected floaters. (c) Our approach combines the capabilities of NeRF with the perspective-friendly nature of MPI in aerial scenes to achieve photorealistic novel view renderings.
Figure 2: Overall pipeline for training Multiplane Prior guided NeRF (MPNeRF). Our novel MPNeRF architecture integrates a standard NeRF branch with an MPI branch, informed by a pre-trained SwinV2 Transformer. This design introduces a multiplane prior to guide the NeRF training, addressing the common challenges of rendering with sparse aerial data. The process begins by sampling three distinct views: a source and target view for training with known ground truth, and an unseen view from a novel viewpoint. The NeRF model is then refined using pseudo labels produced by the MPI branch, which are especially crucial for synthesizing views from previously unseen angles, as shown in the pipeline.
Figure 3: Visual comparisons on 3 selected scenes with 3 and 5 views. MPNeRF achieves photo-realistic quality in different scenes compared with ground-truth images on novel views.
Figure 4: We investigate the data efficiency achieved by our method. Our method requires up to $63.5\%$ training images to achieve a similar performance compared to a vanilla NeRF model.
Figure 5: Hyperparameter Sensitivity Analysis. Performance comparison of our method (MPNeRF) and a baseline NeRF model across different values of hyperparameter $\lambda$. The graphs show PSNR, SSIM, and LPIPS metrics. MPNeRF is robust to a wide $\lambda$ choice.
...and 3 more figures

Multiplane Prior Guided Few-Shot Aerial Scene Rendering

TL;DR

Abstract

Multiplane Prior Guided Few-Shot Aerial Scene Rendering

Authors

TL;DR

Abstract

Table of Contents

Figures (8)