Table of Contents
Fetching ...

psPRF:Pansharpening Planar Neural Radiance Field for Generalized 3D Reconstruction Satellite Imagery

Tongtong Zhang, Yuanxiang Li

TL;DR

psPRF introduces a generalized Planar Neural Radiance Field that fuses low-resolution RGB and high-resolution PAN data within a RPC-aware, multimodal encoder framework. By employing Spectral-to-Spatial Convolution, a depth-embedded MPI decoder, and differentiable RPC reprojection, it achieves joint synthesis of HR-RGB, HR-PAN, and DSM from a single image pair and generalizes across scenes. Experiments on WorldView-3 data show state-of-the-art performance in novel-view synthesis and altitude accuracy, with improved efficiency due to planar rendering. Pan-sharpening emerges as an image-synthesis outcome rather than a separate pre-processing step, enabling practical deployment for satellite 3D reconstruction across varying resolutions and viewpoints.

Abstract

Most current NeRF variants for satellites are designed for one specific scene and fall short of generalization to new geometry. Additionally, the RGB images require pan-sharpening as an independent preprocessing step. This paper introduces psPRF, a Planar Neural Radiance Field designed for paired low-resolution RGB (LR-RGB) and high-resolution panchromatic (HR-PAN) images from satellite sensors with Rational Polynomial Cameras (RPC). To capture the cross-modal prior from both of the LR-RGB and HR-PAN images, for the Unet-shaped architecture, we adapt the encoder with explicit spectral-to-spatial convolution (SSConv) to enhance the multimodal representation ability. To support the generalization ability of psRPF across scenes, we adopt projection loss to ensure strong geometry self-supervision. The proposed method is evaluated with the multi-scene WorldView-3 LR-RGB and HR-PAN pairs, and achieves state-of-the-art performance.

psPRF:Pansharpening Planar Neural Radiance Field for Generalized 3D Reconstruction Satellite Imagery

TL;DR

psPRF introduces a generalized Planar Neural Radiance Field that fuses low-resolution RGB and high-resolution PAN data within a RPC-aware, multimodal encoder framework. By employing Spectral-to-Spatial Convolution, a depth-embedded MPI decoder, and differentiable RPC reprojection, it achieves joint synthesis of HR-RGB, HR-PAN, and DSM from a single image pair and generalizes across scenes. Experiments on WorldView-3 data show state-of-the-art performance in novel-view synthesis and altitude accuracy, with improved efficiency due to planar rendering. Pan-sharpening emerges as an image-synthesis outcome rather than a separate pre-processing step, enabling practical deployment for satellite 3D reconstruction across varying resolutions and viewpoints.

Abstract

Most current NeRF variants for satellites are designed for one specific scene and fall short of generalization to new geometry. Additionally, the RGB images require pan-sharpening as an independent preprocessing step. This paper introduces psPRF, a Planar Neural Radiance Field designed for paired low-resolution RGB (LR-RGB) and high-resolution panchromatic (HR-PAN) images from satellite sensors with Rational Polynomial Cameras (RPC). To capture the cross-modal prior from both of the LR-RGB and HR-PAN images, for the Unet-shaped architecture, we adapt the encoder with explicit spectral-to-spatial convolution (SSConv) to enhance the multimodal representation ability. To support the generalization ability of psRPF across scenes, we adopt projection loss to ensure strong geometry self-supervision. The proposed method is evaluated with the multi-scene WorldView-3 LR-RGB and HR-PAN pairs, and achieves state-of-the-art performance.
Paper Structure (31 sections, 17 equations, 9 figures, 8 tables)

This paper contains 31 sections, 17 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: The pipeline of psPRF with a multimodal encoder, a multitask encoder to produce multiscale MPIs.
  • Figure 2: Reprojection from the synthesized product of the source view to the products of the target view by warping the frustum.
  • Figure 3: Reprojection from the synthesized altitude map to the rendered image of the source view.
  • Figure 4: When all the input images are from a single scene, psPRF is able to produce satisfactory results even when there are very few input views. However, the performance of EO-NeRF (SatNeRF) and SatensoRF are severely affected when the number of input views is reduced. While rpcPRF is robust to view reduction, it fails to enhance the resolution according to input HR-PANs.
  • Figure 5: Ablation study of how different level of reprojection loss affects the rendering results.
  • ...and 4 more figures