Table of Contents
Fetching ...

Fast and Generalizable NeRF Architecture Selection for Satellite Scene Reconstruction

Devjyoti Chakraborty, Zaki Sukma, Rakandhiya D. Rachmanto, Kriti Ghosh, In Kee Kim, Suchendra M. Bhandarkar, Lakshmish Ramaswamy, Nancy K. O'Hare, Deepak Mishra

Abstract

Neural Radiance Fields (NeRF) have emerged as a powerful approach for photorealistic 3D reconstruction from multi-view images. However, deploying NeRF for satellite imagery remains challenging. Each scene requires individual training, and optimizing architectures via Neural Architecture Search (NAS) demands hours to days of GPU time. While existing approaches focus on architectural improvements, our SHAP analysis reveals that multi-view consistency, rather than model architecture, determines reconstruction quality. Based on this insight, we develop PreSCAN, a predictive framework that estimates NeRF quality prior to training using lightweight geometric and photometric descriptors. PreSCAN selects suitable architectures in < 30 seconds with < 1 dB prediction error, achieving 1000$\times$ speedup over NAS. We further demonstrate PreSCAN's deployment utility on edge platforms (Jetson Orin), where combining its predictions with offline cost profiling reduces inference power by 26% and latency by 43% with minimal quality loss. Experiments on DFC2019 datasets confirm that PreSCAN generalizes across diverse satellite scenes without retraining.

Fast and Generalizable NeRF Architecture Selection for Satellite Scene Reconstruction

Abstract

Neural Radiance Fields (NeRF) have emerged as a powerful approach for photorealistic 3D reconstruction from multi-view images. However, deploying NeRF for satellite imagery remains challenging. Each scene requires individual training, and optimizing architectures via Neural Architecture Search (NAS) demands hours to days of GPU time. While existing approaches focus on architectural improvements, our SHAP analysis reveals that multi-view consistency, rather than model architecture, determines reconstruction quality. Based on this insight, we develop PreSCAN, a predictive framework that estimates NeRF quality prior to training using lightweight geometric and photometric descriptors. PreSCAN selects suitable architectures in < 30 seconds with < 1 dB prediction error, achieving 1000 speedup over NAS. We further demonstrate PreSCAN's deployment utility on edge platforms (Jetson Orin), where combining its predictions with offline cost profiling reduces inference power by 26% and latency by 43% with minimal quality loss. Experiments on DFC2019 datasets confirm that PreSCAN generalizes across diverse satellite scenes without retraining.
Paper Structure (25 sections, 10 equations, 5 figures, 6 tables)

This paper contains 25 sections, 10 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Overview of the PreSCAN framework.PreSCAN predicts suitable NeRF architectures before training by modeling the interaction between architectural parameters and scene descriptors. Using lightweight inputs (e.g., view alignment, sampling patterns), it enables fast ($<$ 30 sec.), generalizable, scalable architecture selection without per-scene retraining.
  • Figure 2: SHAP value analysis across JAX, OMA, and combined datasets.(a) 'JAX scenes': Inverse PSNR ($w'$), photometric variance ($\overline{\text{Var}(I_i)}$), and cosine similarity ($overline{\text{cos sim}}$) are the most influential. (b) 'OMA scenes': The same top features are consistently identified, despite scene differences. (c) 'Combined dataset': Feature rankings remain stable, confirming their general importance across datasets.
  • Figure 3: MAE ($\Delta$PSNR) distributions for different NeRF architectures across scenes.Each subplot shows prediction consistency per scene, with most errors $<$ 1$\,\mathrm{dB}$ dB and many clustering $<$ 0.5$\,\mathrm{dB}$, demonstrating PreSCAN's robust generalization.
  • Figure 4: Distribution of absolute prediction errors ($\Delta$PSNR) across various architectural parameters.(a) 'Layers': NeRF models with 6 and 8-layer yield tighter, more stable error distributions, while models with 10-layer shows higher spread and occasional outliers.(b) 'Feature Dimensionality': 128-feature models produce the most consistent predictions.(c) 'Number of Samples': Increasing the number of samples $>$ 32 offers diminishing returns, with all settings clustering near 1$\,\mathrm{dB}$ error.
  • Figure 5: Comparison of power consumption and time between the most accurate model and the cost-efficient model on the edge device (Blue indicates baseline, red indicates PreSCAN chosen optimal).(a) Training power consumption, (b) Rendering power consumption, (c) Training time, (d) Rendering time, (e) Energy-PSNR tradeoff. PreSCAN with hardware-aware architecture selection achieves significant efficiency gains with minimal quality loss.