Table of Contents
Fetching ...

RareFlow: Physics-Aware Flow-Matching for Cross-Sensor Super-Resolution of Rare-Earth Features

Forouzan Fallah, Wenwen Li, Chia-Yu Hsu, Hyunho Lee, Yezhou Yang

TL;DR

RareFlow tackles cross-sensor remote-sensing super-resolution under out-of-distribution conditions by fusing physics-aware diffusion with dual conditioning: a Gated ControlNet preserves LR geometry and text-driven semantic priors guide rare feature synthesis. It introduces an uncertainty-aware gating mechanism and a physics-aware objective comprising spectral, radiometric, and perceptual losses, all trained with a frozen backbone and learnable control adapters. On a curated cross-sensor RTS benchmark, RareFlow achieves state-of-the-art perceptual realism (low LPIPS, DISTS, and FID) while maintaining high fidelity, evidenced by robust PSNR/SSIM/FSIM scores and qualitative expert assessments; its uncertainty estimates also help identify unfamiliar inputs to reduce hallucinations. The results underscore the approach’s potential for high-fidelity, physics-consistent synthesis in data-scarce scientific domains and highlight a path toward reliable cross-domain generation under severe domain shifts.

Abstract

Super-resolution (SR) for remote sensing imagery often fails under out-of-distribution (OOD) conditions, such as rare geomorphic features captured by diverse sensors, producing visually plausible but physically inaccurate results. We present RareFlow, a physics-aware SR framework designed for OOD robustness. RareFlow's core is a dual-conditioning architecture. A Gated ControlNet preserves fine-grained geometric fidelity from the low-resolution input, while textual prompts provide semantic guidance for synthesizing complex features. To ensure physically sound outputs, we introduce a multifaceted loss function that enforces both spectral and radiometric consistency with sensor properties. Furthermore, the framework quantifies its own predictive uncertainty by employing a stochastic forward pass approach; the resulting output variance directly identifies unfamiliar inputs, mitigating feature hallucination. We validate RareFlow on a new, curated benchmark of multi-sensor satellite imagery. In blind evaluations, geophysical experts rated our model's outputs as approaching the fidelity of ground truth imagery, significantly outperforming state-of-the-art baselines. This qualitative superiority is corroborated by quantitative gains in perceptual metrics, including a nearly 40\% reduction in FID. RareFlow provides a robust framework for high-fidelity synthesis in data-scarce scientific domains and offers a new paradigm for controlled generation under severe domain shift.

RareFlow: Physics-Aware Flow-Matching for Cross-Sensor Super-Resolution of Rare-Earth Features

TL;DR

RareFlow tackles cross-sensor remote-sensing super-resolution under out-of-distribution conditions by fusing physics-aware diffusion with dual conditioning: a Gated ControlNet preserves LR geometry and text-driven semantic priors guide rare feature synthesis. It introduces an uncertainty-aware gating mechanism and a physics-aware objective comprising spectral, radiometric, and perceptual losses, all trained with a frozen backbone and learnable control adapters. On a curated cross-sensor RTS benchmark, RareFlow achieves state-of-the-art perceptual realism (low LPIPS, DISTS, and FID) while maintaining high fidelity, evidenced by robust PSNR/SSIM/FSIM scores and qualitative expert assessments; its uncertainty estimates also help identify unfamiliar inputs to reduce hallucinations. The results underscore the approach’s potential for high-fidelity, physics-consistent synthesis in data-scarce scientific domains and highlight a path toward reliable cross-domain generation under severe domain shifts.

Abstract

Super-resolution (SR) for remote sensing imagery often fails under out-of-distribution (OOD) conditions, such as rare geomorphic features captured by diverse sensors, producing visually plausible but physically inaccurate results. We present RareFlow, a physics-aware SR framework designed for OOD robustness. RareFlow's core is a dual-conditioning architecture. A Gated ControlNet preserves fine-grained geometric fidelity from the low-resolution input, while textual prompts provide semantic guidance for synthesizing complex features. To ensure physically sound outputs, we introduce a multifaceted loss function that enforces both spectral and radiometric consistency with sensor properties. Furthermore, the framework quantifies its own predictive uncertainty by employing a stochastic forward pass approach; the resulting output variance directly identifies unfamiliar inputs, mitigating feature hallucination. We validate RareFlow on a new, curated benchmark of multi-sensor satellite imagery. In blind evaluations, geophysical experts rated our model's outputs as approaching the fidelity of ground truth imagery, significantly outperforming state-of-the-art baselines. This qualitative superiority is corroborated by quantitative gains in perceptual metrics, including a nearly 40\% reduction in FID. RareFlow provides a robust framework for high-fidelity synthesis in data-scarce scientific domains and offers a new paradigm for controlled generation under severe domain shift.

Paper Structure

This paper contains 37 sections, 38 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: When the LR input is blurry or semantically OOD, spatial-only guidance preserves coarse morphology yet remains soft, while semantic-only guidance hallucinates plausible—but incorrect—textures. RareFlow balances structural evidence from the LR image with textual semantics, suppressing hallucinations and preserving physically consistent geometry and spectra.
  • Figure 2: (a)The training architecture of RareFlow. The control path (orange) consumes LR latents and caption tokens to produce residual hints $r_i$ and predicts per-block scalars $\alpha^l\!\in\![0,1]$ that scale $r_i$ before injection into the frozen backbone (blue) features $F_i \leftarrow F_i + \alpha_i r_i$. (b) ControlNet MM-DiT block internals which produce $\alpha_i$ (see Sec. \ref{['subsec:control-scalar']}).
  • Figure 3: Data challenges. Left to right: HR (Maxar), LR (Sentinel-2, Percentile Norm), LR (Sentinel-2, Fixed Norm). Row 1 shows color discrepancies and the effect of normalization. Row 2 shows spatial misalignment. Row 3 shows temporal misalignment due to changes in snow cover.
  • Figure 4: Qualitative comparison on paired LR–HR data.
  • Figure 5: Visual Comparison of model variants.
  • ...and 3 more figures