Table of Contents
Fetching ...

Morphing Through Time: Diffusion-Based Bridging of Temporal Gaps for Robust Alignment in Change Detection

Seyedehanita Madani, Vishal M. Patel

TL;DR

The work tackles robust change detection under severe spatial and temporal misalignment in remote sensing. It introduces a modular pipeline that first bridges appearance gaps with diffusion-based semantic morphing (DiffMorpher), then estimates stepwise dense registrations with RoMa, and finally refines the accumulated motion using ResidualRefinerNet, without retraining any CD backbone. The approach is model-agnostic and demonstrates consistent improvements in both registration accuracy and downstream CD across multiple datasets and backbones. Practically, this diffusion-bridged, refinement-based alignment offers a plug-and-play solution for operational CD tasks with real-world domain shifts.

Abstract

Remote sensing change detection is often challenged by spatial misalignment between bi-temporal images, especially when acquisitions are separated by long seasonal or multi-year gaps. While modern convolutional and transformer-based models perform well on aligned data, their reliance on precise co-registration limits their robustness in real-world conditions. Existing joint registration-detection frameworks typically require retraining and transfer poorly across domains. We introduce a modular pipeline that improves spatial and temporal robustness without altering existing change detection networks. The framework integrates diffusion-based semantic morphing, dense registration, and residual flow refinement. A diffusion module synthesizes intermediate morphing frames that bridge large appearance gaps, enabling RoMa to estimate stepwise correspondences between consecutive frames. The composed flow is then refined through a lightweight U-Net to produce a high-fidelity warp that co-registers the original image pair. Extensive experiments on LEVIR-CD, WHU-CD, and DSIFN-CD show consistent gains in both registration accuracy and downstream change detection across multiple backbones, demonstrating the generality and effectiveness of the proposed approach.

Morphing Through Time: Diffusion-Based Bridging of Temporal Gaps for Robust Alignment in Change Detection

TL;DR

The work tackles robust change detection under severe spatial and temporal misalignment in remote sensing. It introduces a modular pipeline that first bridges appearance gaps with diffusion-based semantic morphing (DiffMorpher), then estimates stepwise dense registrations with RoMa, and finally refines the accumulated motion using ResidualRefinerNet, without retraining any CD backbone. The approach is model-agnostic and demonstrates consistent improvements in both registration accuracy and downstream CD across multiple datasets and backbones. Practically, this diffusion-bridged, refinement-based alignment offers a plug-and-play solution for operational CD tasks with real-world domain shifts.

Abstract

Remote sensing change detection is often challenged by spatial misalignment between bi-temporal images, especially when acquisitions are separated by long seasonal or multi-year gaps. While modern convolutional and transformer-based models perform well on aligned data, their reliance on precise co-registration limits their robustness in real-world conditions. Existing joint registration-detection frameworks typically require retraining and transfer poorly across domains. We introduce a modular pipeline that improves spatial and temporal robustness without altering existing change detection networks. The framework integrates diffusion-based semantic morphing, dense registration, and residual flow refinement. A diffusion module synthesizes intermediate morphing frames that bridge large appearance gaps, enabling RoMa to estimate stepwise correspondences between consecutive frames. The composed flow is then refined through a lightweight U-Net to produce a high-fidelity warp that co-registers the original image pair. Extensive experiments on LEVIR-CD, WHU-CD, and DSIFN-CD show consistent gains in both registration accuracy and downstream change detection across multiple backbones, demonstrating the generality and effectiveness of the proposed approach.

Paper Structure

This paper contains 17 sections, 8 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Pipeline overview. Given bi-temporal images $I_A$ and $I_B$, DiffMorpher generates $K$ semantic intermediates. RoMa estimates stepwise flows $\{F_{t \to t+1}\}$, which are composed ($\circ$) into a global warp $F_{A \to B}$ and refined by a Residual Flow Refinement module to $\hat{F}_{A \to B}$. The refined flow warps ($\odot$) $I_B$ to $I_B'$, and the pair $(I_A, I_B')$ is fed to a frozen CD backbone to produce the change map. Dashed arrows denote raw inputs ($I_A, I_B$), while solid arrows denote intermediate signals. (Illustrated with $K=5$.)
  • Figure 2: Qualitative visualization of intermediate generation. Top: Intermediate images generated via DiffMorpher. Bottom: their warped versions aligned to $I_B$. These sequential morphs enable more accurate motion decomposition and alignment across large scene shifts.
  • Figure 3: ResidualRefinerNet architecture. The input pair $(I_A, I_B)$ is encoded to a $32\times$ downsampled feature map. RoMa flow is projected and fused at the bottleneck. The decoder progressively upsamples and predicts residual flow $\Delta F$.
  • Figure 4: Final alignment and flow visualization. Top: composed vs. direct RoMa warp and our refined result. Bottom: corresponding flow maps (input, GT, refined) reveal improvements in structure and smoothness.
  • Figure 5: CD outputs. From left to right: ground truth mask, prediction from unaligned pair, and predictions after alignment using direct RoMa, composed RoMa, and refined RoMa.
  • ...and 1 more figures