Table of Contents
Fetching ...

Decomposition-based Unsupervised Domain Adaptation for Remote Sensing Image Semantic Segmentation

Xianping Ma, Xiaokang Zhang, Xingchen Ding, Man-On Pun, Siwei Ma

TL;DR

A novel decomposition-based UDA framework called De-GLGAN is developed to improve the cross-domain transferability and generalization capability of semantic segmentation models and demonstrates the effectiveness and superiority of the proposed approach over existing state-of-the-art UDA methods.

Abstract

Unsupervised domain adaptation (UDA) techniques are vital for semantic segmentation in geosciences, effectively utilizing remote sensing imagery across diverse domains. However, most existing UDA methods, which focus on domain alignment at the high-level feature space, struggle to simultaneously retain local spatial details and global contextual semantics. To overcome these challenges, a novel decomposition scheme is proposed to guide domain-invariant representation learning. Specifically, multiscale high/low-frequency decomposition (HLFD) modules are proposed to decompose feature maps into high- and low-frequency components across different subspaces. This decomposition is integrated into a fully global-local generative adversarial network (GLGAN) that incorporates global-local transformer blocks (GLTBs) to enhance the alignment of decomposed features. By integrating the HLFD scheme and the GLGAN, a novel decomposition-based UDA framework called De-GLGAN is developed to improve the cross-domain transferability and generalization capability of semantic segmentation models. Extensive experiments on two UDA benchmarks, namely ISPRS Potsdam and Vaihingen, and LoveDA Rural and Urban, demonstrate the effectiveness and superiority of the proposed approach over existing state-of-the-art UDA methods. The source code for this work is accessible at https://github.com/sstary/SSRS.

Decomposition-based Unsupervised Domain Adaptation for Remote Sensing Image Semantic Segmentation

TL;DR

A novel decomposition-based UDA framework called De-GLGAN is developed to improve the cross-domain transferability and generalization capability of semantic segmentation models and demonstrates the effectiveness and superiority of the proposed approach over existing state-of-the-art UDA methods.

Abstract

Unsupervised domain adaptation (UDA) techniques are vital for semantic segmentation in geosciences, effectively utilizing remote sensing imagery across diverse domains. However, most existing UDA methods, which focus on domain alignment at the high-level feature space, struggle to simultaneously retain local spatial details and global contextual semantics. To overcome these challenges, a novel decomposition scheme is proposed to guide domain-invariant representation learning. Specifically, multiscale high/low-frequency decomposition (HLFD) modules are proposed to decompose feature maps into high- and low-frequency components across different subspaces. This decomposition is integrated into a fully global-local generative adversarial network (GLGAN) that incorporates global-local transformer blocks (GLTBs) to enhance the alignment of decomposed features. By integrating the HLFD scheme and the GLGAN, a novel decomposition-based UDA framework called De-GLGAN is developed to improve the cross-domain transferability and generalization capability of semantic segmentation models. Extensive experiments on two UDA benchmarks, namely ISPRS Potsdam and Vaihingen, and LoveDA Rural and Urban, demonstrate the effectiveness and superiority of the proposed approach over existing state-of-the-art UDA methods. The source code for this work is accessible at https://github.com/sstary/SSRS.
Paper Structure (28 sections, 12 equations, 16 figures, 12 tables, 1 algorithm)

This paper contains 28 sections, 12 equations, 16 figures, 12 tables, 1 algorithm.

Figures (16)

  • Figure 1: The main challenges of UDA semantic segmentation in remote sensing images. Larger-scale variations of trees and buildings are observed within and across different domains. For example, the building highlighted in the left figure is much smaller than those in the right figure, while the trees highlighted in the left figure are connected and much larger than that on the right. Additionally, the boundaries of ground objects on both domains often exhibit varying styles and complexities.
  • Figure 2: The basic ideas of our decomposition strategy. (a) Feature map extracted by the encoder, (b) feature map extracted further by MHSA and (c) its corresponding frequency spectrum; (d) feature map extracted further by Conv and (e) its corresponding frequency spectrum. The centroid represents low-frequency information, while the distance from it represents high-frequency information. More high-value points represent more information for the corresponding frequency.
  • Figure 3: (a) GLTB propsoed in UNetformer wang2022unetformer and (b) the proposed HLFD module. They have similar structures, both based on MHSA and Conv. However, the module structure and optimization goal of the network determine their different functions: one focuses on global-local contextual information extraction, while the other handles cross-domain feature decomposition.
  • Figure 4: The overview of the proposed De-GLGAN, which is comprised of a Generator, a GLDis and the multiscale HLFDs. The Generator adopts the classic encoder-decoder structure, which can extract image features and predict pixel-wise category labels. The multiscale HLFDs are proposed to align cross-domain representations by decomposing multiscale features generated by the encoder. The GLDis further learns domain-invariant representations by adversarial learning strategy.
  • Figure 5: The detailed structure of the encoder and the SwinTB. The encoder extracts image features by stacking SwinTBs based on MHSA. By loading the pre-trained weights, it can effectively extract cross-domain multiscale image features.
  • ...and 11 more figures