Table of Contents
Fetching ...

Semantically Robust Unsupervised Image Translation for Paired Remote Sensing Images

Sheng Fang, Kaiyu Li, Zhe Li, Jianli Zhao, Xingli Zhang

TL;DR

This work tackles semantically robust, deterministic unsupervised image translation for bi-temporal remote sensing images by leveraging paired data. It introduces SRUIT, which enforces a shared latent space $\mathscr{Z}$ through weight-sharing of high-level layers and uses cross-cycle consistency to preserve semantics during translation between $\mathscr{A}$ and $\mathscr{B}$ without extra supervisory networks. Quantitative and qualitative results on season-variant RS datasets show SRUIT improves semantic preservation in change-detection tasks while delivering competitive perceptual quality, outperforming Cycle-GAN and GC-GAN in key semantic metrics. The approach offers practical value for change detection and RS analysis by enabling reliable, semantically faithful translation across time with limited supervision.

Abstract

Image translation for change detection or classification in bi-temporal remote sensing images is unique. Although it can acquire paired images, it is still unsupervised. Moreover, strict semantic preservation in translation is always needed instead of multimodal outputs. In response to these problems, this paper proposes a new method, SRUIT (Semantically Robust Unsupervised Image-to-image Translation), which ensures semantically robust translation and produces deterministic output. Inspired by previous works, the method explores the underlying characteristics of bi-temporal Remote Sensing images and designs the corresponding networks. Firstly, we assume that bi-temporal Remote Sensing images share the same latent space, for they are always acquired from the same land location. So SRUIT makes the generators share their high-level layers, and this constraint will compel two domain mapping to fall into the same latent space. Secondly, considering land covers of bi-temporal images could evolve into each other, SRUIT exploits the cross-cycle-consistent adversarial networks to translate from one to the other and recover them. Experimental results show that constraints of sharing weights and cross-cycle consistency enable translated images with both good perceptual image quality and semantic preservation for significant differences.

Semantically Robust Unsupervised Image Translation for Paired Remote Sensing Images

TL;DR

This work tackles semantically robust, deterministic unsupervised image translation for bi-temporal remote sensing images by leveraging paired data. It introduces SRUIT, which enforces a shared latent space through weight-sharing of high-level layers and uses cross-cycle consistency to preserve semantics during translation between and without extra supervisory networks. Quantitative and qualitative results on season-variant RS datasets show SRUIT improves semantic preservation in change-detection tasks while delivering competitive perceptual quality, outperforming Cycle-GAN and GC-GAN in key semantic metrics. The approach offers practical value for change detection and RS analysis by enabling reliable, semantically faithful translation across time with limited supervision.

Abstract

Image translation for change detection or classification in bi-temporal remote sensing images is unique. Although it can acquire paired images, it is still unsupervised. Moreover, strict semantic preservation in translation is always needed instead of multimodal outputs. In response to these problems, this paper proposes a new method, SRUIT (Semantically Robust Unsupervised Image-to-image Translation), which ensures semantically robust translation and produces deterministic output. Inspired by previous works, the method explores the underlying characteristics of bi-temporal Remote Sensing images and designs the corresponding networks. Firstly, we assume that bi-temporal Remote Sensing images share the same latent space, for they are always acquired from the same land location. So SRUIT makes the generators share their high-level layers, and this constraint will compel two domain mapping to fall into the same latent space. Secondly, considering land covers of bi-temporal images could evolve into each other, SRUIT exploits the cross-cycle-consistent adversarial networks to translate from one to the other and recover them. Experimental results show that constraints of sharing weights and cross-cycle consistency enable translated images with both good perceptual image quality and semantic preservation for significant differences.

Paper Structure

This paper contains 15 sections, 4 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Bi-temporal RS images with season-varying in CDD datasetLebedev:ISPRS2018, (a) and (b) are the original bi-temporal images taken in summer and winter, and (c) is the ground truth map of changed areas.
  • Figure 2: Illustration of bi-temporal image translation from the source domain to latent space and then to the target domain when some land cover areas are changed between images $A$ and $B$. If they have the same land covers, the translated sample should fall into the same point with the target in both shared latent space and target domain.
  • Figure 3: Cross-cycle translation between source domain, shared latent space, and target domain, taking domain $\mathscr{A}$ as the source and domain $\mathscr{B}$ as the target. The solid lines denote the processes of directly recovering $A$, and the dashed lines denote the main processes of cross-cycle translations.
  • Figure 4: (a) Generator of SRUIT. Each generator consists of one encoder and one decoder. The high-level layers of both encoder and decoder are weights-shared. In brief, only the processes of translation of $A$ are depicted. (b) Discriminator of SRUIT, taken $A$ as an example.
  • Figure 5: The dataset CDDLebedev:ISPRS2018 with season-varying. Images in the top row are bi-temporal images of summer and fall and have a minor difference; images in the bottom row are bi-temporal images of summer and winter and have a major difference.
  • ...and 6 more figures