Table of Contents
Fetching ...

Kolmogorov-Arnold Network for Remote Sensing Image Semantic Segmentation

Xianping Ma, Ziyao Wang, Yin Hu, Xiaokang Zhang, Man-On Pun

TL;DR

The paper addresses the challenge of leveraging high-dimensional encoder features for accurate remote sensing semantic segmentation by introducing DeepKANSeg, a novel encoder-decoder network built on Kolmogorov–Arnold Networks. It replaces traditional MLP-based decoding with GLKAN and uses a stacked DeepKAN feature refinement module to decompose complex high-dimensional representations into univariate transformations, enabling improved feature learning and interpretability, formalized as $f(\boldsymbol{x}) = \sum_q \phi_q\left(\sum_p \psi_{pq}(x_p)\right)$. Evaluated on ISPRS Vaihingen and Potsdam with ResNet-18 and ViT-L backbones, DeepKANSeg delivers superior mF1 and mIoU across classes, with notable gains in finely detailed structures and long-range context. The findings demonstrate the potential of KAN-based modules to enhance remote sensing segmentation, while acknowledging increased computational complexity and reliance on pretrained encoders as areas for future optimization and interpretability expansion.

Abstract

Semantic segmentation plays a crucial role in remote sensing applications, where the accurate extraction and representation of features are essential for high-quality results. Despite the widespread use of encoder-decoder architectures, existing methods often struggle with fully utilizing the high-dimensional features extracted by the encoder and efficiently recovering detailed information during decoding. To address these problems, we propose a novel semantic segmentation network, namely DeepKANSeg, including two key innovations based on the emerging Kolmogorov Arnold Network (KAN). Notably, the advantage of KAN lies in its ability to decompose high-dimensional complex functions into univariate transformations, enabling efficient and flexible representation of intricate relationships in data. First, we introduce a KAN-based deep feature refinement module, namely DeepKAN to effectively capture complex spatial and rich semantic relationships from high-dimensional features. Second, we replace the traditional multi-layer perceptron (MLP) layers in the global-local combined decoder with KAN-based linear layers, namely GLKAN. This module enhances the decoder's ability to capture fine-grained details during decoding. To evaluate the effectiveness of the proposed method, experiments are conducted on two well-known fine-resolution remote sensing benchmark datasets, namely ISPRS Vaihingen and ISPRS Potsdam. The results demonstrate that the KAN-enhanced segmentation model achieves superior performance in terms of accuracy compared to state-of-the-art methods. They highlight the potential of KANs as a powerful alternative to traditional architectures in semantic segmentation tasks. Moreover, the explicit univariate decomposition provides improved interpretability, which is particularly beneficial for applications requiring explainable learning in remote sensing.

Kolmogorov-Arnold Network for Remote Sensing Image Semantic Segmentation

TL;DR

The paper addresses the challenge of leveraging high-dimensional encoder features for accurate remote sensing semantic segmentation by introducing DeepKANSeg, a novel encoder-decoder network built on Kolmogorov–Arnold Networks. It replaces traditional MLP-based decoding with GLKAN and uses a stacked DeepKAN feature refinement module to decompose complex high-dimensional representations into univariate transformations, enabling improved feature learning and interpretability, formalized as . Evaluated on ISPRS Vaihingen and Potsdam with ResNet-18 and ViT-L backbones, DeepKANSeg delivers superior mF1 and mIoU across classes, with notable gains in finely detailed structures and long-range context. The findings demonstrate the potential of KAN-based modules to enhance remote sensing segmentation, while acknowledging increased computational complexity and reliance on pretrained encoders as areas for future optimization and interpretability expansion.

Abstract

Semantic segmentation plays a crucial role in remote sensing applications, where the accurate extraction and representation of features are essential for high-quality results. Despite the widespread use of encoder-decoder architectures, existing methods often struggle with fully utilizing the high-dimensional features extracted by the encoder and efficiently recovering detailed information during decoding. To address these problems, we propose a novel semantic segmentation network, namely DeepKANSeg, including two key innovations based on the emerging Kolmogorov Arnold Network (KAN). Notably, the advantage of KAN lies in its ability to decompose high-dimensional complex functions into univariate transformations, enabling efficient and flexible representation of intricate relationships in data. First, we introduce a KAN-based deep feature refinement module, namely DeepKAN to effectively capture complex spatial and rich semantic relationships from high-dimensional features. Second, we replace the traditional multi-layer perceptron (MLP) layers in the global-local combined decoder with KAN-based linear layers, namely GLKAN. This module enhances the decoder's ability to capture fine-grained details during decoding. To evaluate the effectiveness of the proposed method, experiments are conducted on two well-known fine-resolution remote sensing benchmark datasets, namely ISPRS Vaihingen and ISPRS Potsdam. The results demonstrate that the KAN-enhanced segmentation model achieves superior performance in terms of accuracy compared to state-of-the-art methods. They highlight the potential of KANs as a powerful alternative to traditional architectures in semantic segmentation tasks. Moreover, the explicit univariate decomposition provides improved interpretability, which is particularly beneficial for applications requiring explainable learning in remote sensing.
Paper Structure (21 sections, 7 equations, 8 figures, 5 tables)

This paper contains 21 sections, 7 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: The structure of a two-layer KAN follows Eq. \ref{['eq1']}. It learns through multiple learnable activation functions.
  • Figure 2: The overview of the proposed DeepKANSeg, which is comprised of three parts: an encoder, a deep feature refinement module, and a decoder. The combination of a pre-trained encoder with the KAN-based deep feature refinement module and decoder further enhances the semantic segmentation performance.
  • Figure 3: (a) The structure of the proposed DeepKAN, (b) the structure of the core refinement module KAN block, (c) and a simplified illustration of the KAN layer. The high-dimension feature is refined by the stacked KAN layer, which is crucial for extracting complex remote sensing semantic information.
  • Figure 4: The structure of the proposed GLKAN. It resembles the classic transformer block, with the component highlighted in the red box modified into KAN-based blocks.
  • Figure 5: Four visual samples of size $1024 \times 1024$ from the Vaihingen (first two columns) and Potsdam (last two columns) datasets, respectively, are presented. The first row contains orthophotos, with Vaihingen represented in NIRRG channels and Potsdam in RGB channels, while the second row shows the corresponding ground truth. The urban environments in these datasets, with their diverse and complex features, make semantic segmentation a particularly challenging task. Experiments conducted on these two datasets provide a comprehensive validation of the model's performance.
  • ...and 3 more figures