Table of Contents
Fetching ...

FusionRF: High-Fidelity Satellite Neural Radiance Fields from Multispectral and Panchromatic Acquisitions

Michael Sprintson, Rama Chellappa, Cheng Peng

TL;DR

FusionRF tackles digital surface reconstruction from satellite imagery by eliminating the need for pansharpening preprocessing. It introduces a satellite NeRF that jointly optimizes on full-channel multispectral and panchromatic inputs using a sparse cross-resolution kernel and multimodal/transient embeddings to intrinsically fuse information and render high-fidelity novel views. The approach yields a 17% average reduction in depth MAE compared to baselines and demonstrates robustness to limited panchromatic data while being adaptable to EO-NeRF-style extensions. This work offers a practical path to accurate DSM reconstruction from commodity satellite datasets without hand-crafted pansharpening, enhancing reliability across domains and sensor conditions.

Abstract

We introduce FusionRF, a novel framework for digital surface reconstruction from satellite multispectral and panchromatic images. Current work has demonstrated the increased accuracy of neural photogrammetry for surface reconstruction from optical satellite images compared to algorithmic methods. Common satellites produce both a panchromatic and multispectral image, which contain high spatial and spectral information respectively. Current neural reconstruction methods require multispectral images to be upsampled with a pansharpening method using the spatial data in the panchromatic image. However, these methods may introduce biases and hallucinations due to domain gaps. FusionRF introduces joint image fusion during optimization through a novel cross-resolution kernel that learns to resolve spatial resolution loss present in multispectral images. As input, FusionRF accepts the original multispectral and panchromatic data, eliminating the need for image preprocessing. FusionRF also leverages multimodal appearance embeddings that encode the image characteristics of each modality and view within a uniform representation. By optimizing on both modalities, FusionRF learns to fuse image modalities while performing reconstruction tasks and eliminates the need for a pansharpening preprocessing step. We evaluate our method on multispectral and panchromatic satellite images from the WorldView-3 satellite in various locations, and show that FusionRF provides an average of 17% reduction in depth reconstruction error, and renders sharp training and novel views.

FusionRF: High-Fidelity Satellite Neural Radiance Fields from Multispectral and Panchromatic Acquisitions

TL;DR

FusionRF tackles digital surface reconstruction from satellite imagery by eliminating the need for pansharpening preprocessing. It introduces a satellite NeRF that jointly optimizes on full-channel multispectral and panchromatic inputs using a sparse cross-resolution kernel and multimodal/transient embeddings to intrinsically fuse information and render high-fidelity novel views. The approach yields a 17% average reduction in depth MAE compared to baselines and demonstrates robustness to limited panchromatic data while being adaptable to EO-NeRF-style extensions. This work offers a practical path to accurate DSM reconstruction from commodity satellite datasets without hand-crafted pansharpening, enhancing reliability across domains and sensor conditions.

Abstract

We introduce FusionRF, a novel framework for digital surface reconstruction from satellite multispectral and panchromatic images. Current work has demonstrated the increased accuracy of neural photogrammetry for surface reconstruction from optical satellite images compared to algorithmic methods. Common satellites produce both a panchromatic and multispectral image, which contain high spatial and spectral information respectively. Current neural reconstruction methods require multispectral images to be upsampled with a pansharpening method using the spatial data in the panchromatic image. However, these methods may introduce biases and hallucinations due to domain gaps. FusionRF introduces joint image fusion during optimization through a novel cross-resolution kernel that learns to resolve spatial resolution loss present in multispectral images. As input, FusionRF accepts the original multispectral and panchromatic data, eliminating the need for image preprocessing. FusionRF also leverages multimodal appearance embeddings that encode the image characteristics of each modality and view within a uniform representation. By optimizing on both modalities, FusionRF learns to fuse image modalities while performing reconstruction tasks and eliminates the need for a pansharpening preprocessing step. We evaluate our method on multispectral and panchromatic satellite images from the WorldView-3 satellite in various locations, and show that FusionRF provides an average of 17% reduction in depth reconstruction error, and renders sharp training and novel views.
Paper Structure (20 sections, 11 equations, 11 figures, 4 tables)

This paper contains 20 sections, 11 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Model Comparison: The optimization of previous models, such as S-NeRF and Sat-NeRF derksen2021shadowmari2022sat are shown in red, first pansharpening the input data and then computing loss only against the resulting pansharpened image. Our method independently trains a NeRF $(F_{\Theta})$ on multispectral and panchromatic images, optimizing against the original input images. The inclusion of the cross-resolution kernel ($F_{\Psi}$) encourages the model to learn to perform image fusion during training. During evaluation, the cross-resolution kernel is disabled, rendering novel view multispectral images with increased spatial resolution.
  • Figure 2: Our Network Architecture: Every input ray $r$ is projected from origin $\textbf{o}$ in $I_{lms}$ and $I_{pan}$ to the same ground point $g$. The sparse cross-resolution kernel $F_{\Psi}$ predicts weights $w_q$ for static locations $\{q\}$ surrounding $\textbf{o}$, which are then combined with the color predictions $\{c_q\}$ to produce the final output color.
  • Figure 3: Embedding Diagram: One modal embedding is shared across all panchromatic images and another across all multispectral images, allowing the model to encode modal information in a uniform representation. Each image view, represented by date of capture, also shares one image embedding across both the panchromatic and multispectral images. In the case of a view containing only one modality, the image embedding is unique to that image.
  • Figure 4: Pansharpening Visual Comparison: The first row shows the original multispectral and panchromatic images alongside FusionRF's generated image and pansharpened images from deep learning methods. The images show the tendency of pansharpening methods to hallucinate color casts and additional details on buildings and shows the variance in the final result produced by pansharpening methods. The second row shows the LIDAR depth map provided by the dataset, along with the rendered depth from a FusionRF model trained on the raw panchromatic and multispectral data along with FusionRF models trained on pansharpened imagery. Finally, the third row shows the error maps of these reconstructions against the LIDAR depth map. This comparison shows that achieving optimal image quality does not improve depth reconstruction.
  • Figure 5: Spectral Comparison: For each of the eight bands of the WorldView-3 image shown in Figure \ref{['fig:sharppancomp']}, we plot the spectral density in the form of a histogram of pixel values. We display results for the FusionRF generated views from the multispectral and panchromatic data as well as the images pansharpened by PSDip and DRPNN.
  • ...and 6 more figures