Table of Contents
Fetching ...

DF4LCZ: A SAM-Empowered Data Fusion Framework for Scene-Level Local Climate Zone Classification

Qianqian Wu, Xianping Ma, Jialu Sui, Man-On Pun

TL;DR

A novel data fusion approach that combines high-resolution Google imagery, which provides ground object priors, with Sentinel-2 multispectral imagery, and enhanced by a graph convolutional network module, powered by the segment anything model (SAM), to improve feature extraction from Google imagery.

Abstract

Recent advancements in remote sensing (RS) technologies have shown their potential in accurately classifying local climate zones (LCZs). However, traditional scene-level methods using convolutional neural networks (CNNs) often struggle to integrate prior knowledge of ground objects effectively. Moreover, commonly utilized data sources like Sentinel-2 encounter difficulties in capturing detailed ground object information. To tackle these challenges, we propose a data fusion method that integrates ground object priors extracted from high-resolution Google imagery with Sentinel-2 multispectral imagery. The proposed method introduces a novel Dual-stream Fusion framework for LCZ classification (DF4LCZ), integrating instance-based location features from Google imagery with the scene-level spatial-spectral features extracted from Sentinel-2 imagery. The framework incorporates a Graph Convolutional Network (GCN) module empowered by the Segment Anything Model (SAM) to enhance feature extraction from Google imagery. Simultaneously, the framework employs a 3D-CNN architecture to learn the spectral-spatial features of Sentinel-2 imagery. Experiments are conducted on a multi-source remote sensing image dataset specifically designed for LCZ classification, validating the effectiveness of the proposed DF4LCZ. The related code and dataset are available at https://github.com/ctrlovefly/DF4LCZ.

DF4LCZ: A SAM-Empowered Data Fusion Framework for Scene-Level Local Climate Zone Classification

TL;DR

A novel data fusion approach that combines high-resolution Google imagery, which provides ground object priors, with Sentinel-2 multispectral imagery, and enhanced by a graph convolutional network module, powered by the segment anything model (SAM), to improve feature extraction from Google imagery.

Abstract

Recent advancements in remote sensing (RS) technologies have shown their potential in accurately classifying local climate zones (LCZs). However, traditional scene-level methods using convolutional neural networks (CNNs) often struggle to integrate prior knowledge of ground objects effectively. Moreover, commonly utilized data sources like Sentinel-2 encounter difficulties in capturing detailed ground object information. To tackle these challenges, we propose a data fusion method that integrates ground object priors extracted from high-resolution Google imagery with Sentinel-2 multispectral imagery. The proposed method introduces a novel Dual-stream Fusion framework for LCZ classification (DF4LCZ), integrating instance-based location features from Google imagery with the scene-level spatial-spectral features extracted from Sentinel-2 imagery. The framework incorporates a Graph Convolutional Network (GCN) module empowered by the Segment Anything Model (SAM) to enhance feature extraction from Google imagery. Simultaneously, the framework employs a 3D-CNN architecture to learn the spectral-spatial features of Sentinel-2 imagery. Experiments are conducted on a multi-source remote sensing image dataset specifically designed for LCZ classification, validating the effectiveness of the proposed DF4LCZ. The related code and dataset are available at https://github.com/ctrlovefly/DF4LCZ.
Paper Structure (26 sections, 8 equations, 11 figures, 5 tables)

This paper contains 26 sections, 8 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Illustration and a brief definition of LCZ types. There are ten built types (1–10) and seven land cover types (A–G).
  • Figure 2: Illustration of the DF4LCZ framework, composed of three main modules: (1) a Google Earth stream focusing on instance-based location feature extraction; (2) a Sentinel-2 stream dedicated to scene-level spatial-spectral feature extraction; and (3) a fusion and classification module.
  • Figure 3: Structure of the GCN Network in this study. (a) Input, Layers, and Output: The GCN network's input is the graph obtained from the graph construction procedure. The network consists of three graph convolutional layers with skip connections (GCSConv), each producing an output dimension of 32 for every node embedding vector $h^{(k+1)}$, followed by a global average pooling layer, a fully connected layer, and a softmax layer. The output of the network is a vector representing probability distribution over all the possible classes. (b) GCSConv Layer: The GCSConv layer embeds input node features through two stages, namely feature aggregation and updating.
  • Figure 4: Structure of the 3D ResNet11 network in this study. (a) Input, Layers, and Output: The 3D ResNet11 network's input is the Sentinel-2 image patch. The network consists of an initial convolutional layer with an output channel dimension of 64, followed by a batch normalization layer and ReLU. Following that, three 3D residual blocks are applied to extract spatial-spectral features. The stacked residual blocks are followed by a global average pooling layer, a fully connected layer, and a softmax layer. The output of the network is a vector representing probability distribution over all the possible classes. (b) 3D residual block: Each 3D residual block has a sequence of 3D convolutional layers followed by batch normalization and rectified linear unit activation functions, and a residual connection is within each block. The number of filters for the three blocks is 64, 128, and 256, respectively.
  • Figure 5: An illustration of our study area spanned eight cities in Southeast China, namely Guangzhou, Hangzhou, Hefei, Hong Kong, Nanchang, Nanjing, Shanghai, and Wuhan.
  • ...and 6 more figures