Table of Contents
Fetching ...

TriDF: Triplane-Accelerated Density Fields for Few-Shot Remote Sensing Novel View Synthesis

Jiaming Kang, Keyan Chen, Zhengxia Zou, Zhenwei Shi

TL;DR

Remote-sensing novel view synthesis from sparse views is challenged by overfitting and slow optimization of NeRF-based methods. TriDF introduces a hybrid explicit–implicit representation that decouples color and volume density: color is encoded via a triplane, while geometry is captured with density fields aided by neighboring-view reference features and image-based rendering; depth-guided optimization using SfM and PatchMatch point clouds stabilizes training. Empirical results on LEVIR-NVS show TriDF delivers substantial gains in rendering quality (PSNR and SSIM) and dramatically faster reconstruction (up to 30×) compared with state-of-the-art few-shot approaches, with TriDF-10K achieving comparable quality in about 5 minutes. This approach enables fast, accurate 3D interpretation for urban planning and environmental monitoring, and the code is publicly available for reproducibility and deployment.

Abstract

Remote sensing novel view synthesis (NVS) offers significant potential for 3D interpretation of remote sensing scenes, with important applications in urban planning and environmental monitoring. However, remote sensing scenes frequently lack sufficient multi-view images due to acquisition constraints. While existing NVS methods tend to overfit when processing limited input views, advanced few-shot NVS methods are computationally intensive and perform sub-optimally in remote sensing scenes. This paper presents TriDF, an efficient hybrid 3D representation for fast remote sensing NVS from as few as 3 input views. Our approach decouples color and volume density information, modeling them independently to reduce the computational burden on implicit radiance fields and accelerate reconstruction.We explore the potential of the triplane representation in few-shot NVS tasks by mapping high-frequency color information onto this compact structure, and the direct optimization of feature planes significantly speeds up convergence. Volume density is modeled as continuous density fields, incorporating reference features from neighboring views through image-based rendering to compensate for limited input data. Additionally, we introduce depth-guided optimization based on point clouds, which effectively mitigates the overfitting problem in few-shot NVS.Comprehensive experiments across multiple remote sensing scenes demonstrate that our hybrid representation achieves a 30x speed increase compared to NeRF-based methods, while simultaneously improving rendering quality metrics over advanced few-shot methods (7.4% increase in PSNR and 3.4% in SSIM). The code is publicly available at https://github.com/kanehub/TriDF

TriDF: Triplane-Accelerated Density Fields for Few-Shot Remote Sensing Novel View Synthesis

TL;DR

Remote-sensing novel view synthesis from sparse views is challenged by overfitting and slow optimization of NeRF-based methods. TriDF introduces a hybrid explicit–implicit representation that decouples color and volume density: color is encoded via a triplane, while geometry is captured with density fields aided by neighboring-view reference features and image-based rendering; depth-guided optimization using SfM and PatchMatch point clouds stabilizes training. Empirical results on LEVIR-NVS show TriDF delivers substantial gains in rendering quality (PSNR and SSIM) and dramatically faster reconstruction (up to 30×) compared with state-of-the-art few-shot approaches, with TriDF-10K achieving comparable quality in about 5 minutes. This approach enables fast, accurate 3D interpretation for urban planning and environmental monitoring, and the code is publicly available for reproducibility and deployment.

Abstract

Remote sensing novel view synthesis (NVS) offers significant potential for 3D interpretation of remote sensing scenes, with important applications in urban planning and environmental monitoring. However, remote sensing scenes frequently lack sufficient multi-view images due to acquisition constraints. While existing NVS methods tend to overfit when processing limited input views, advanced few-shot NVS methods are computationally intensive and perform sub-optimally in remote sensing scenes. This paper presents TriDF, an efficient hybrid 3D representation for fast remote sensing NVS from as few as 3 input views. Our approach decouples color and volume density information, modeling them independently to reduce the computational burden on implicit radiance fields and accelerate reconstruction.We explore the potential of the triplane representation in few-shot NVS tasks by mapping high-frequency color information onto this compact structure, and the direct optimization of feature planes significantly speeds up convergence. Volume density is modeled as continuous density fields, incorporating reference features from neighboring views through image-based rendering to compensate for limited input data. Additionally, we introduce depth-guided optimization based on point clouds, which effectively mitigates the overfitting problem in few-shot NVS.Comprehensive experiments across multiple remote sensing scenes demonstrate that our hybrid representation achieves a 30x speed increase compared to NeRF-based methods, while simultaneously improving rendering quality metrics over advanced few-shot methods (7.4% increase in PSNR and 3.4% in SSIM). The code is publicly available at https://github.com/kanehub/TriDF

Paper Structure

This paper contains 28 sections, 17 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Few-shot remote sensing novel view synthesis. (a) Our hybrid 3D representation TriDF with depth-guided optimization achieves fast reconstruction from only 3 input views. (b) Rendering quality vs. training time on LEVIR-NVS dataset. Our method achieves the best rendering quality compared with advanced few-shot methods, and TriDF-10k (early version with only 10k iterations) is comparable to state-of-the-art 3DGSkerbl20233d in reconstruction time.
  • Figure 2: Overview of the proposed TriDF. We introduce an efficient hybrid representation for few-shot novel view synthesis, which takes in sparse posed images and performs fast reconstruction. TriDF consists of a TriPlane branch and a Density Fields branch and separately predicts color and volume density. We also integrate the image-based rendering framework and depth-guided optimization based on 3D point clouds for such hybrid representations to stabilize the training process of few-shot NVS tasks.
  • Figure 3: Qualitative comparison of test-view rendering images with 3 input views on LEVIR-NVS dataset.
  • Figure 4: Detailed qualitative comparison on RGB renderings and depth maps. Our method can synthesize more accurate scene details, such as building edges and corners. The depth maps intuitively demonstrate that our approach captures scene geometry more effectively and exhibits fewer distortions.
  • Figure 5: Three different implementations of our hybrid representation. (a) The triplane branch integrates with the image-based rendering framework, enhancing input information through feature sampling from reference images. (b) Volume density and color are modeled independently, with the density fields branch being integrated into the image-based rendering framework. (c) The density fields simultaneously generate volume density and intermediate features $f_m$, which are subsequently processed by the triplane branch for color prediction, facilitating information interaction between color and volume density components.
  • ...and 4 more figures