Table of Contents
Fetching ...

SwiftGS: Episodic Priors for Immediate Satellite Surface Recovery

Rong Fu, Jiekai Wu, Haiyun Wei, Xiaowen Ma, Shiyin Lin, Kangan Qian, Chuang Liu, Jianyuan Ni, Simon James Fong

Abstract

Rapid, large-scale 3D reconstruction from multi-date satellite imagery is vital for environmental monitoring, urban planning, and disaster response, yet remains difficult due to illumination changes, sensor heterogeneity, and the cost of per-scene optimization. We introduce SwiftGS, a meta-learned system that reconstructs 3D surfaces in a single forward pass by predicting geometry-radiation-decoupled Gaussian primitives together with a lightweight SDF, replacing expensive per-scene fitting with episodic training that captures transferable priors. The model couples a differentiable physics graph for projection, illumination, and sensor response with spatial gating that blends sparse Gaussian detail and global SDF structure, and incorporates semantic-geometric fusion, conditional lightweight task heads, and multi-view supervision from a frozen geometric teacher under an uncertainty-aware multi-task loss. At inference, SwiftGS operates zero-shot with optional compact calibration and achieves accurate DSM reconstruction and view-consistent rendering at significantly reduced computational cost, with ablations highlighting the benefits of the hybrid representation, physics-aware rendering, and episodic meta-training.

SwiftGS: Episodic Priors for Immediate Satellite Surface Recovery

Abstract

Rapid, large-scale 3D reconstruction from multi-date satellite imagery is vital for environmental monitoring, urban planning, and disaster response, yet remains difficult due to illumination changes, sensor heterogeneity, and the cost of per-scene optimization. We introduce SwiftGS, a meta-learned system that reconstructs 3D surfaces in a single forward pass by predicting geometry-radiation-decoupled Gaussian primitives together with a lightweight SDF, replacing expensive per-scene fitting with episodic training that captures transferable priors. The model couples a differentiable physics graph for projection, illumination, and sensor response with spatial gating that blends sparse Gaussian detail and global SDF structure, and incorporates semantic-geometric fusion, conditional lightweight task heads, and multi-view supervision from a frozen geometric teacher under an uncertainty-aware multi-task loss. At inference, SwiftGS operates zero-shot with optional compact calibration and achieves accurate DSM reconstruction and view-consistent rendering at significantly reduced computational cost, with ablations highlighting the benefits of the hybrid representation, physics-aware rendering, and episodic meta-training.
Paper Structure (52 sections, 5 theorems, 54 equations, 6 figures, 15 tables, 1 algorithm)

This paper contains 52 sections, 5 theorems, 54 equations, 6 figures, 15 tables, 1 algorithm.

Key Result

Proposition A.1

Suppose the true radiance--density field $\mathcal{W}^\star$ admits the decomposition where $\lambda_{\mathrm{g}}^\star$ and $\lambda_{\mathrm{s}}^\star$ are measurable functions satisfying $\lambda_{\mathrm{g}}^\star(x)+\lambda_{\mathrm{s}}^\star(x)=1$ for every $x$. Let the model class contain predictors $\mathcal{W}_{\Phi}$ indexed by parameters $\Phi$, and denote by $\lambda_g(\c and training

Figures (6)

  • Figure 1: Overview of the SwiftGS architecture for efficient, zero-shot satellite surface reconstruction. The pipeline begins with Multi-View Encoding, where per-view features and a global scene latent are extracted. A Hybrid Decoder produces a compact Gaussian set $\Gamma$, an implicit SDF $S_{\psi}$, and spatial gates that blend sparse and dense components. Lightweight Task-Specific Heads optionally refine geometry or appearance. A Differentiable Physics Graph renders elevation and albedo using camera geometry and illumination, supported by distilled geometric cues during training. An episodic meta-training scheme learns shared parameters $\Phi$ with a small per-scene calibration vector $\theta$. At inference, SwiftGS runs zero-shot and outputs a DSM, consistent renderings, and a compact Gaussian memory for incremental reconstruction.
  • Figure 2: Shadow and lighting consistency: input, predicted shadow, rendered image, albedo, and ground truth.
  • Figure 3: Training curves showing stable convergence of query loss, DSM error, and shadow loss.
  • Figure 4: Qualitative comparison across scenes. Each row shows input images, predicted DSM and renderings from SwiftGS and EO-NeRF, along with ground truth. MAE and LPIPS errors are annotated.
  • Figure 5: Representative failure cases of SwiftGS. Top row: dense high-rise urban canyon exhibiting height hallucination due to severe occlusion. Middle row: water body with systematic elevation overestimation caused by specular reflection violating diffuse reflectance assumptions. Bottom row: extreme shadow coverage ($>$60%) showing incomplete shadow removal and associated height artifacts. For each case, columns show input image, predicted DSM, error map (red indicates overestimation, blue underestimation), and ground truth DSM.
  • ...and 1 more figures

Theorems & Definitions (7)

  • Proposition A.1
  • Proposition A.2
  • Lemma A.3
  • Proposition A.4
  • proof
  • Proposition A.5
  • proof