Band Prompting Aided SAR and Multi-Spectral Data Fusion Framework for Local Climate Zone Classification
Haiyan Lan, Shujun Li, Mingjie Xie, Xuanjia Zhao, Hongning Liu, Pengming Feng, Dongli Xu, Guangjun He, Jian Guan
TL;DR
This work tackles LCZ classification by fusing SAR and multispectral data through a band-aware, text-guided framework called BP-LCZ. It introduces band grouping to decompose multimodal data, a band group prompting (BGP) strategy to align band-group representations with descriptive prompts, and a multivariate supervised matrix (MSM) to reduce positive/negative sample confusion in contrastive learning. Empirical results on the So2Sat LCZ42 dataset show substantial gains for RS-specific architectures, with EB-CNN and ExViT gaining notable improvements in OA and Kappa when equipped with BP-LCZ, and Ablation studies confirming the complementary benefits of BGP and MSM. The approach advances multimodal fusion in remote sensing by leveraging textual prompts to encode physical band properties and semantic categories, offering practical gains for urban climate-related mapping while signaling remaining challenges related to domain shift across geographic regions.
Abstract
Local climate zone (LCZ) classification is of great value for understanding the complex interactions between urban development and local climate. Recent studies have increasingly focused on the fusion of synthetic aperture radar (SAR) and multi-spectral data to improve LCZ classification performance. However, it remains challenging due to the distinct physical properties of these two types of data and the absence of effective fusion guidance. In this paper, a novel band prompting aided data fusion framework is proposed for LCZ classification, namely BP-LCZ, which utilizes textual prompts associated with band groups to guide the model in learning the physical attributes of different bands and semantics of various categories inherent in SAR and multi-spectral data to augment the fused feature, thus enhancing LCZ classification performance. Specifically, a band group prompting (BGP) strategy is introduced to align the visual representation effectively at the level of band groups, which also facilitates a more adequate extraction of semantic information of different bands with textual information. In addition, a multivariate supervised matrix (MSM) based training strategy is proposed to alleviate the problem of positive and negative sample confusion by completing the supervised information. The experimental results demonstrate the effectiveness and superiority of the proposed data fusion framework.
