Semantically Robust Unsupervised Image Translation for Paired Remote Sensing Images
Sheng Fang, Kaiyu Li, Zhe Li, Jianli Zhao, Xingli Zhang
TL;DR
This work tackles semantically robust, deterministic unsupervised image translation for bi-temporal remote sensing images by leveraging paired data. It introduces SRUIT, which enforces a shared latent space $\mathscr{Z}$ through weight-sharing of high-level layers and uses cross-cycle consistency to preserve semantics during translation between $\mathscr{A}$ and $\mathscr{B}$ without extra supervisory networks. Quantitative and qualitative results on season-variant RS datasets show SRUIT improves semantic preservation in change-detection tasks while delivering competitive perceptual quality, outperforming Cycle-GAN and GC-GAN in key semantic metrics. The approach offers practical value for change detection and RS analysis by enabling reliable, semantically faithful translation across time with limited supervision.
Abstract
Image translation for change detection or classification in bi-temporal remote sensing images is unique. Although it can acquire paired images, it is still unsupervised. Moreover, strict semantic preservation in translation is always needed instead of multimodal outputs. In response to these problems, this paper proposes a new method, SRUIT (Semantically Robust Unsupervised Image-to-image Translation), which ensures semantically robust translation and produces deterministic output. Inspired by previous works, the method explores the underlying characteristics of bi-temporal Remote Sensing images and designs the corresponding networks. Firstly, we assume that bi-temporal Remote Sensing images share the same latent space, for they are always acquired from the same land location. So SRUIT makes the generators share their high-level layers, and this constraint will compel two domain mapping to fall into the same latent space. Secondly, considering land covers of bi-temporal images could evolve into each other, SRUIT exploits the cross-cycle-consistent adversarial networks to translate from one to the other and recover them. Experimental results show that constraints of sharing weights and cross-cycle consistency enable translated images with both good perceptual image quality and semantic preservation for significant differences.
