Table of Contents
Fetching ...

Multitask GLocal OBIA-Mamba for Sentinel-2 Landcover Mapping

Zack Dewis, Yimin Zhu, Zhengsen Xu, Mabel Heffring, Saeid Taleghanidoozdoozan, Kaylee Xiao, Motasem Alkayid, Lincoln Linlin Xu

TL;DR

This work tackles Sentinel-2 land use/land cover mapping, hampered by spatial heterogeneity and signature ambiguity. It introduces Multitask Glocal OBIA-Mamba (MSOM), combining an OBIA-Mamba module with a dual-branch GLocal CNN-Mamba architecture and a multitask loss to balance local precision and global coherence. Key innovations include using superpixel tokens within a state space Mamba framework for efficient global modeling, a CNN-Mamba dual pathway for local-global fusion, and a weighted loss ($L_{\text{total}} = \alpha L_{\text{local}} + \beta L_{\text{global}}$) with $\alpha=0.7$, $\beta=0.3$. On Alberta Sentinel-2 data, MSOM outperforms state-of-the-art baselines in OA, AA, and Kappa, while preserving edges and finer details, demonstrating both accuracy and computational efficiency. This approach offers a scalable and robust pipeline for LULC mapping in settings with uncertain ground-truth boundaries.

Abstract

Although Sentinel-2 based land use and land cover (LULC) classification is critical for various environmental monitoring applications, it is a very difficult task due to some key data challenges (e.g., spatial heterogeneity, context information, signature ambiguity). This paper presents a novel Multitask Glocal OBIA-Mamba (MSOM) for enhanced Sentinel-2 classification with the following contributions. First, an object-based image analysis (OBIA) Mamba model (OBIA-Mamba) is designed to reduce redundant computation without compromising fine-grained details by using superpixels as Mamba tokens. Second, a global-local (GLocal) dual-branch convolutional neural network (CNN)-mamba architecture is designed to jointly model local spatial detail and global contextual information. Third, a multitask optimization framework is designed to employ dual loss functions to balance local precision with global consistency. The proposed approach is tested on Sentinel-2 imagery in Alberta, Canada, in comparison with several advanced classification approaches, and the results demonstrate that the proposed approach achieves higher classification accuracy and finer details that the other state-of-the-art methods.

Multitask GLocal OBIA-Mamba for Sentinel-2 Landcover Mapping

TL;DR

This work tackles Sentinel-2 land use/land cover mapping, hampered by spatial heterogeneity and signature ambiguity. It introduces Multitask Glocal OBIA-Mamba (MSOM), combining an OBIA-Mamba module with a dual-branch GLocal CNN-Mamba architecture and a multitask loss to balance local precision and global coherence. Key innovations include using superpixel tokens within a state space Mamba framework for efficient global modeling, a CNN-Mamba dual pathway for local-global fusion, and a weighted loss () with , . On Alberta Sentinel-2 data, MSOM outperforms state-of-the-art baselines in OA, AA, and Kappa, while preserving edges and finer details, demonstrating both accuracy and computational efficiency. This approach offers a scalable and robust pipeline for LULC mapping in settings with uncertain ground-truth boundaries.

Abstract

Although Sentinel-2 based land use and land cover (LULC) classification is critical for various environmental monitoring applications, it is a very difficult task due to some key data challenges (e.g., spatial heterogeneity, context information, signature ambiguity). This paper presents a novel Multitask Glocal OBIA-Mamba (MSOM) for enhanced Sentinel-2 classification with the following contributions. First, an object-based image analysis (OBIA) Mamba model (OBIA-Mamba) is designed to reduce redundant computation without compromising fine-grained details by using superpixels as Mamba tokens. Second, a global-local (GLocal) dual-branch convolutional neural network (CNN)-mamba architecture is designed to jointly model local spatial detail and global contextual information. Third, a multitask optimization framework is designed to employ dual loss functions to balance local precision with global consistency. The proposed approach is tested on Sentinel-2 imagery in Alberta, Canada, in comparison with several advanced classification approaches, and the results demonstrate that the proposed approach achieves higher classification accuracy and finer details that the other state-of-the-art methods.

Paper Structure

This paper contains 10 sections, 6 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Traditional Mamba approaches (Top) treat each pixel as a token, and scan the tokens in a fixed, predefined, dense and rigid manner, whereas our OBIA-Mamba approach (bottom) treat superpixels/objects as tokens, and builds token sequence in a dynamic, learnable, sparse and adaptable manner, leading to reduced computational cost, improved edge preservation and enhanced longer-range, larger-scale modelling capabilities.
  • Figure 2: The proposed OBIA Mamba is a GLocal dual branch architecture that features a local resnet branch and a global OBIA-Mamba. The OBIA-Mamba branch leverages superpixels, which reduces redundant computation by replacing pixel scanning with superpixel scanning. The GLocal architecture is joined together to produce a map that jointly models local spatial details and global contextual information. A multitask loss guides both the local and global branches to balance the local precision and global consistency.
  • Figure 3: The maps generated by the various models on the city of Edmonton Alberta. It can be seen that our model provides the most detail compared to the RGB image, while simultaneously having the least amount of noise. It can be seen that our model depicts the entire river flowing through the city, which other models struggle to do. Our approach also demonstrates strong edge preservation, allowing prediction of small green patches throughout the city where natural classes can be observed in the RGB image.
  • Figure 4: The province of Alberta generated by the various models. It can be seen that our model has the best spatial consistency with the ground truth. Our model does not suffer from over prediction on the urban class to the same extent as models such as RNN, LSTM and ResNet. The wetland class is also not underestimated in Northern Alberta in our approach unlike SSRN, HRNet and ViT.