Table of Contents
Fetching ...

Generalization-aware Remote Sensing Change Detection via Domain-agnostic Learning

Qi Zang, Shuang Wang, Dong Zhao, Dou Quan, Yang Hu, Licheng Jiao

TL;DR

DonaNet tackles pseudo-changes in remote-sensing change detection caused by spectral/style shifts by decoupling style through local channel-wise statistics and learning domain-agnostic representations. It introduces two core ideas: global-to-local normalization/whitening (DDR) to remove domain-specific style while preserving discriminability, and cross-temporal generalization learning (CTGL) with cross-temporal style transformations and a consistency loss to encourage content-focused, robust features. The method yields state-of-the-art results on five public datasets with a smaller model size and strong cross-domain robustness. This work provides a practical framework for generalizable change detection that reduces reliance on heavy distribution alignment or GAN-based style transfer.

Abstract

Change detection has essential significance for the region's development, in which pseudo-changes between bitemporal images induced by imaging environmental factors are key challenges. Existing transformation-based methods regard pseudo-changes as a kind of style shift and alleviate it by transforming bitemporal images into the same style using generative adversarial networks (GANs). However, their efforts are limited by two drawbacks: 1) Transformed images suffer from distortion that reduces feature discrimination. 2) Alignment hampers the model from learning domain-agnostic representations that degrades performance on scenes with domain shifts from the training data. Therefore, oriented from pseudo-changes caused by style differences, we present a generalizable domain-agnostic difference learning network (DonaNet). For the drawback 1), we argue for local-level statistics as style proxies to assist against domain shifts. For the drawback 2), DonaNet learns domain-agnostic representations by removing domain-specific style of encoded features and highlighting the class characteristics of objects. In the removal, we propose a domain difference removal module to reduce feature variance while preserving discriminative properties and propose its enhanced version to provide possibilities for eliminating more style by decorrelating the correlation between features. In the highlighting, we propose a cross-temporal generalization learning strategy to imitate latent domain shifts, thus enabling the model to extract feature representations more robust to shifts actively. Extensive experiments conducted on three public datasets demonstrate that DonaNet outperforms existing state-of-the-art methods with a smaller model size and is more robust to domain shift.

Generalization-aware Remote Sensing Change Detection via Domain-agnostic Learning

TL;DR

DonaNet tackles pseudo-changes in remote-sensing change detection caused by spectral/style shifts by decoupling style through local channel-wise statistics and learning domain-agnostic representations. It introduces two core ideas: global-to-local normalization/whitening (DDR) to remove domain-specific style while preserving discriminability, and cross-temporal generalization learning (CTGL) with cross-temporal style transformations and a consistency loss to encourage content-focused, robust features. The method yields state-of-the-art results on five public datasets with a smaller model size and strong cross-domain robustness. This work provides a practical framework for generalizable change detection that reduces reliance on heavy distribution alignment or GAN-based style transfer.

Abstract

Change detection has essential significance for the region's development, in which pseudo-changes between bitemporal images induced by imaging environmental factors are key challenges. Existing transformation-based methods regard pseudo-changes as a kind of style shift and alleviate it by transforming bitemporal images into the same style using generative adversarial networks (GANs). However, their efforts are limited by two drawbacks: 1) Transformed images suffer from distortion that reduces feature discrimination. 2) Alignment hampers the model from learning domain-agnostic representations that degrades performance on scenes with domain shifts from the training data. Therefore, oriented from pseudo-changes caused by style differences, we present a generalizable domain-agnostic difference learning network (DonaNet). For the drawback 1), we argue for local-level statistics as style proxies to assist against domain shifts. For the drawback 2), DonaNet learns domain-agnostic representations by removing domain-specific style of encoded features and highlighting the class characteristics of objects. In the removal, we propose a domain difference removal module to reduce feature variance while preserving discriminative properties and propose its enhanced version to provide possibilities for eliminating more style by decorrelating the correlation between features. In the highlighting, we propose a cross-temporal generalization learning strategy to imitate latent domain shifts, thus enabling the model to extract feature representations more robust to shifts actively. Extensive experiments conducted on three public datasets demonstrate that DonaNet outperforms existing state-of-the-art methods with a smaller model size and is more robust to domain shift.

Paper Structure

This paper contains 31 sections, 19 equations, 15 figures, 11 tables.

Figures (15)

  • Figure 1: Example of domain shift for two types: (a) Cross-temporal domain shifts within a bitemporal image pair. (b) Cross-scene domain shifts between bitemporal image pairs acquired from different scenes.
  • Figure 2: (a) Visualization of channel-wise means and standard deviations between local-level bitemporal images. (b) Visualization of channel-wise means and standard deviations between local-level bitemporal features.
  • Figure 3: An overview of the proposed DonaNet. A pair of images ($x_A$, $x_B$) are first fed into the siamese difference (SD) network. The domain-specific style information in the extracted features is eliminated by the embedded domain difference removal (DDR) module. Then the Manhattan distance is calculated between the features of the two images to obtain multi-level difference features $\{\textbf{F}_i\}_{i=1}^{3}$. These features are aggregated to generate aggregated features $\textbf{F}_{agg}$. The cross-temporal style transformation (CTST) module produces stylized image pairs ($\hat{x}_{}^{A\to B}$, $\hat{x}_{}^{B\to A}$) via a statistics-based style transformation. The generated stylized image pair is then also fed into the SD network. The obtained result $\textbf{P}_{out}^{sty}$ is aligned with the result $\textbf{P}_{out}^{ori}$ of the original image pair in the output space by a cross-temporal consistency regularization (CTCR) loss $\mathcal{L}_{CTCR}$.
  • Figure 4: The trend of the assigned weights with the corresponding positive-negative sample ratios. 8000 images from the SVCD dataset are randomly sampled as examples.
  • Figure 5: Illustration of different structures: (a) The original learnable residual block structure in the SD network. (b) Global-to-local normalization of the domain difference removal (DDR) module. (c) Global-to-local whitening of the domain difference removal (DDR) module.
  • ...and 10 more figures