Table of Contents
Fetching ...

LOGCAN++: Adaptive Local-global class-aware network for semantic segmentation of remote sensing imagery

Xiaowen Ma, Rongrong Lian, Zhenkai Wu, Hongbo Guo, Mengting Ma, Sensen Wu, Zhenhong Du, Siyang Song, Wei Zhang

TL;DR

LOGCAN++ tackles remote sensing semantic segmentation challenges—complex backgrounds, scale/orientation variability, and large intra-class variance—by introducing a local-global class-aware framework that combines a Global Class Awareness (GCA) module with multiple Local Class Awareness (LCA) modules. The LCA modules leverage local class centers derived through an affine transformation block (ATB) to adapt to object size, shape, and orientation, enabling indirect alignment with global class centers for improved intra-class compactness. Extensive experiments on Vaihingen, Potsdam, and LoveDA demonstrate state-of-the-art performance and good efficiency, with ablations validating the contributions of GCA, LCA, ATB, patch design, and multi-head attention. The results indicate LOGCAN++ is a practical, scalable approach for high-resolution RS segmentation and can be extended with integration into broader frameworks like SAM.

Abstract

Remote sensing images usually characterized by complex backgrounds, scale and orientation variations, and large intra-class variance. General semantic segmentation methods usually fail to fully investigate the above issues, and thus their performances on remote sensing image segmentation are limited. In this paper, we propose our LOGCAN++, a semantic segmentation model customized for remote sensing images, which is made up of a Global Class Awareness (GCA) module and several Local Class Awareness (LCA) modules. The GCA module captures global representations for class-level context modeling to reduce the interference of background noise. The LCA module generates local class representations as intermediate perceptual elements to indirectly associate pixels with the global class representations, targeting at dealing with the large intra-class variance problem. In particular, we introduce affine transformations in the LCA module for adaptive extraction of local class representations to effectively tolerate scale and orientation variations in remotely sensed images. Extensive experiments on three benchmark datasets show that our LOGCAN++ outperforms current mainstream general and remote sensing semantic segmentation methods and achieves a better trade-off between speed and accuracy. Code is available at https://github.com/xwmaxwma/rssegmentation.

LOGCAN++: Adaptive Local-global class-aware network for semantic segmentation of remote sensing imagery

TL;DR

LOGCAN++ tackles remote sensing semantic segmentation challenges—complex backgrounds, scale/orientation variability, and large intra-class variance—by introducing a local-global class-aware framework that combines a Global Class Awareness (GCA) module with multiple Local Class Awareness (LCA) modules. The LCA modules leverage local class centers derived through an affine transformation block (ATB) to adapt to object size, shape, and orientation, enabling indirect alignment with global class centers for improved intra-class compactness. Extensive experiments on Vaihingen, Potsdam, and LoveDA demonstrate state-of-the-art performance and good efficiency, with ablations validating the contributions of GCA, LCA, ATB, patch design, and multi-head attention. The results indicate LOGCAN++ is a practical, scalable approach for high-resolution RS segmentation and can be extended with integration into broader frameworks like SAM.

Abstract

Remote sensing images usually characterized by complex backgrounds, scale and orientation variations, and large intra-class variance. General semantic segmentation methods usually fail to fully investigate the above issues, and thus their performances on remote sensing image segmentation are limited. In this paper, we propose our LOGCAN++, a semantic segmentation model customized for remote sensing images, which is made up of a Global Class Awareness (GCA) module and several Local Class Awareness (LCA) modules. The GCA module captures global representations for class-level context modeling to reduce the interference of background noise. The LCA module generates local class representations as intermediate perceptual elements to indirectly associate pixels with the global class representations, targeting at dealing with the large intra-class variance problem. In particular, we introduce affine transformations in the LCA module for adaptive extraction of local class representations to effectively tolerate scale and orientation variations in remotely sensed images. Extensive experiments on three benchmark datasets show that our LOGCAN++ outperforms current mainstream general and remote sensing semantic segmentation methods and achieves a better trade-off between speed and accuracy. Code is available at https://github.com/xwmaxwma/rssegmentation.
Paper Structure (32 sections, 10 equations, 9 figures, 9 tables)

This paper contains 32 sections, 10 equations, 9 figures, 9 tables.

Figures (9)

  • Figure 1: Visual examples of remote sensing image characteristics. The images at the bottom are selected from the ISPRS Vaihingen, ISPRS Potsdam and the LoveDA datasets. The images in the red, yellow and blue boxes represent the characteristics of remote sensing images with complex backgrounds, scale and orientation variations, and large intra-class variance, respectively.
  • Figure 2: Architecture of the proposed LOGCAN++, which consists of backbone (ResNet-50 by default), local class aware (LCA) modules and a global class aware (GCA) module. GCA module generates global class centers for class level context modeling to reduce the background noise interference. LCA module generates local class centers based on affine transform block (ATB) and applies them as intermediate perceptual elements to indirectly correlate the pixels with the global class centers to mitigate the intra-class variance.
  • Figure 3: Structural details of the affine transform block (ATB). ATB first pools and projects features to produce scaling factors $\Psi$, offset factors $\Delta$, and rotation factors $\Theta$. These factors convert the default local window into a target quadrilateral to accommodate geospatial objects of different sizes, shapes, and orientations. As a result, the local class centers generated by ATB are data-conditional and thus cope well with scale and orientation variations in remote sensing images.
  • Figure 4: Visualization of features output by the last layer of FarSeg, UNetFormer and LOGCAN++. The test image is selected from ISPRS Vaihingen dataset. We implement the experiment with t-SNE tsne.
  • Figure 5: Qualitative comparison between LOGCAN++ and other state-of-the-art methods on the Vaihingen test set. The red dashed box is the area of focus. Best viewed in color and zoom in.
  • ...and 4 more figures