Semantic Change Detection with Asymmetric Siamese Networks
Kunping Yang, Gui-Song Xia, Zicheng Liu, Bo Du, Wen Yang, Marcello Pelillo, Liangpei Zhang
TL;DR
This work addresses semantic change detection in multi-temporal aerial imagery by introducing an Asymmetric Siamese Network (ASN) that leverages heterogeneous feature extraction via the Asymmetric Spatial Pyramid (aSP) and Asymmetric Representation Pyramid (aRP) to capture changes across diverse land-cover distributions. To train and evaluate robustly, the authors create the SECOND dataset, a large-scale, richly annotated benchmark with 30 change types across 6 land-cover classes, including changes between the same class. They further introduce Adaptive Threshold Learning (ATL) to mitigate label-imbalance effects and Separated Kappa (SeK) to provide a more human-aligned evaluation. Empirical results show ASN-ATL consistently outperforms state-of-the-art methods across multiple backbones and testing strategies, with ablations confirming the value of the asymmetric modules and threshold adaptation. The work advances SCD by addressing semantic ambiguity, providing a new dataset, and delivering improved change localization and typing in complex aerial scenes.
Abstract
Given two multi-temporal aerial images, semantic change detection aims to locate the land-cover variations and identify their change types with pixel-wise boundaries. This problem is vital in many earth vision related tasks, such as precise urban planning and natural resource management. Existing state-of-the-art algorithms mainly identify the changed pixels by applying homogeneous operations on each input image and comparing the extracted features. However, in changed regions, totally different land-cover distributions often require heterogeneous features extraction procedures w.r.t each input. In this paper, we present an asymmetric siamese network (ASN) to locate and identify semantic changes through feature pairs obtained from modules of widely different structures, which involve areas of various sizes and apply different quantities of parameters to factor in the discrepancy across different land-cover distributions. To better train and evaluate our model, we create a large-scale well-annotated SEmantic Change detectiON Dataset (SECOND), while an Adaptive Threshold Learning (ATL) module and a Separated Kappa (SeK) coefficient are proposed to alleviate the influences of label imbalance in model training and evaluation. The experimental results demonstrate that the proposed model can stably outperform the state-of-the-art algorithms with different encoder backbones.
