LOGCAN++: Adaptive Local-global class-aware network for semantic segmentation of remote sensing imagery
Xiaowen Ma, Rongrong Lian, Zhenkai Wu, Hongbo Guo, Mengting Ma, Sensen Wu, Zhenhong Du, Siyang Song, Wei Zhang
TL;DR
LOGCAN++ tackles remote sensing semantic segmentation challenges—complex backgrounds, scale/orientation variability, and large intra-class variance—by introducing a local-global class-aware framework that combines a Global Class Awareness (GCA) module with multiple Local Class Awareness (LCA) modules. The LCA modules leverage local class centers derived through an affine transformation block (ATB) to adapt to object size, shape, and orientation, enabling indirect alignment with global class centers for improved intra-class compactness. Extensive experiments on Vaihingen, Potsdam, and LoveDA demonstrate state-of-the-art performance and good efficiency, with ablations validating the contributions of GCA, LCA, ATB, patch design, and multi-head attention. The results indicate LOGCAN++ is a practical, scalable approach for high-resolution RS segmentation and can be extended with integration into broader frameworks like SAM.
Abstract
Remote sensing images usually characterized by complex backgrounds, scale and orientation variations, and large intra-class variance. General semantic segmentation methods usually fail to fully investigate the above issues, and thus their performances on remote sensing image segmentation are limited. In this paper, we propose our LOGCAN++, a semantic segmentation model customized for remote sensing images, which is made up of a Global Class Awareness (GCA) module and several Local Class Awareness (LCA) modules. The GCA module captures global representations for class-level context modeling to reduce the interference of background noise. The LCA module generates local class representations as intermediate perceptual elements to indirectly associate pixels with the global class representations, targeting at dealing with the large intra-class variance problem. In particular, we introduce affine transformations in the LCA module for adaptive extraction of local class representations to effectively tolerate scale and orientation variations in remotely sensed images. Extensive experiments on three benchmark datasets show that our LOGCAN++ outperforms current mainstream general and remote sensing semantic segmentation methods and achieves a better trade-off between speed and accuracy. Code is available at https://github.com/xwmaxwma/rssegmentation.
