Table of Contents
Fetching ...

Semantic Change Detection with Asymmetric Siamese Networks

Kunping Yang, Gui-Song Xia, Zicheng Liu, Bo Du, Wen Yang, Marcello Pelillo, Liangpei Zhang

TL;DR

This work addresses semantic change detection in multi-temporal aerial imagery by introducing an Asymmetric Siamese Network (ASN) that leverages heterogeneous feature extraction via the Asymmetric Spatial Pyramid (aSP) and Asymmetric Representation Pyramid (aRP) to capture changes across diverse land-cover distributions. To train and evaluate robustly, the authors create the SECOND dataset, a large-scale, richly annotated benchmark with 30 change types across 6 land-cover classes, including changes between the same class. They further introduce Adaptive Threshold Learning (ATL) to mitigate label-imbalance effects and Separated Kappa (SeK) to provide a more human-aligned evaluation. Empirical results show ASN-ATL consistently outperforms state-of-the-art methods across multiple backbones and testing strategies, with ablations confirming the value of the asymmetric modules and threshold adaptation. The work advances SCD by addressing semantic ambiguity, providing a new dataset, and delivering improved change localization and typing in complex aerial scenes.

Abstract

Given two multi-temporal aerial images, semantic change detection aims to locate the land-cover variations and identify their change types with pixel-wise boundaries. This problem is vital in many earth vision related tasks, such as precise urban planning and natural resource management. Existing state-of-the-art algorithms mainly identify the changed pixels by applying homogeneous operations on each input image and comparing the extracted features. However, in changed regions, totally different land-cover distributions often require heterogeneous features extraction procedures w.r.t each input. In this paper, we present an asymmetric siamese network (ASN) to locate and identify semantic changes through feature pairs obtained from modules of widely different structures, which involve areas of various sizes and apply different quantities of parameters to factor in the discrepancy across different land-cover distributions. To better train and evaluate our model, we create a large-scale well-annotated SEmantic Change detectiON Dataset (SECOND), while an Adaptive Threshold Learning (ATL) module and a Separated Kappa (SeK) coefficient are proposed to alleviate the influences of label imbalance in model training and evaluation. The experimental results demonstrate that the proposed model can stably outperform the state-of-the-art algorithms with different encoder backbones.

Semantic Change Detection with Asymmetric Siamese Networks

TL;DR

This work addresses semantic change detection in multi-temporal aerial imagery by introducing an Asymmetric Siamese Network (ASN) that leverages heterogeneous feature extraction via the Asymmetric Spatial Pyramid (aSP) and Asymmetric Representation Pyramid (aRP) to capture changes across diverse land-cover distributions. To train and evaluate robustly, the authors create the SECOND dataset, a large-scale, richly annotated benchmark with 30 change types across 6 land-cover classes, including changes between the same class. They further introduce Adaptive Threshold Learning (ATL) to mitigate label-imbalance effects and Separated Kappa (SeK) to provide a more human-aligned evaluation. Empirical results show ASN-ATL consistently outperforms state-of-the-art methods across multiple backbones and testing strategies, with ablations confirming the value of the asymmetric modules and threshold adaptation. The work advances SCD by addressing semantic ambiguity, providing a new dataset, and delivering improved change localization and typing in complex aerial scenes.

Abstract

Given two multi-temporal aerial images, semantic change detection aims to locate the land-cover variations and identify their change types with pixel-wise boundaries. This problem is vital in many earth vision related tasks, such as precise urban planning and natural resource management. Existing state-of-the-art algorithms mainly identify the changed pixels by applying homogeneous operations on each input image and comparing the extracted features. However, in changed regions, totally different land-cover distributions often require heterogeneous features extraction procedures w.r.t each input. In this paper, we present an asymmetric siamese network (ASN) to locate and identify semantic changes through feature pairs obtained from modules of widely different structures, which involve areas of various sizes and apply different quantities of parameters to factor in the discrepancy across different land-cover distributions. To better train and evaluate our model, we create a large-scale well-annotated SEmantic Change detectiON Dataset (SECOND), while an Adaptive Threshold Learning (ATL) module and a Separated Kappa (SeK) coefficient are proposed to alleviate the influences of label imbalance in model training and evaluation. The experimental results demonstrate that the proposed model can stably outperform the state-of-the-art algorithms with different encoder backbones.

Paper Structure

This paper contains 26 sections, 20 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: In aerial images, land-cover objects appearing at different geometrical structures and mixed distributions across multi-temporal images, which we call asymmetric changes, make it difficult to locate and analyze land-cover variations through existing methods with homogeneous image processings w.r.t each input. In contrast with existing methods, we are motivated to design some heterogenous image processings, which we call locally asymmetric, to factor in the discrepancy across different land-cover distributions and provide extra information for SCD problem.
  • Figure 2: Asymmetric Siamese Network (ASN) for SCD. ASN utilizes siamese encoders to map input multi-temporal images into feature space, while the siamese decoders are leveraged to obtain semantic maps. Similarly, encoder and decoders in change detection branch are designed to obtain change map. In contrast to traditional siamese network, ASN utilizes several convolutional sequences and squeeze gates in proposed aSP and aRP to obtain feature pairs deriving from widely different structures, which we call asymmetric feature pairs, to provide extra information. Furthermore, the designed ATL is exploited to adaptively revise the output deflections based on the combinations of raw model outputs through slight extra convolutional layers.
  • Figure 3: The proposed aSP module with length of 1. Index $k$ controls the channel numbers of layers in each convolution sequence, while $j_1$,$j_2$ indicate different receptive fields. Each squeeze gate consists of the concatenation operator, convolution layers and skip connections. aSP exploits asymmetric spatial feature pairs with diverse spatial information.
  • Figure 4: The proposed aRP. Index $k_1$, $k_2$ indicate different feature representation capabilities. Receiving spatial feature pyramids from aSP, aRP fuses asymmetric representation feature pairs with various representation capabilities.
  • Figure 5: Several samples of our proposed SECOND dataset. Color white indicates non-change regions, while other colors indicate different land-cover classes. Ground truth for SCD can be obtained by comparing the annotated land-cover classes.
  • ...and 7 more figures