Table of Contents
Fetching ...

MPCM-Net: Multi-scale network integrates partial attention convolution with Mamba for ground-based cloud image segmentation

Penghui Niu, Jiashuai She, Taotao Cai, Yajuan Zhang, Ping Zhang, Junhua Gu, Jianxin Li

TL;DR

This work targets ground-based cloud image segmentation for PV power forecasting, addressing limitations in multi-scale context handling, attention efficiency, and global guidance in decoders. It introduces MPCM-Net, a two-path encoder-decoder framework that uses a Multi-scale Partial Attention Convolution (MPAC) encoder and a Multi-scale Mamba Decoding (M2B) path with a Spatial-Semantic Hybrid Domain (SSHD) to balance accuracy and inference speed. A new CSRC dataset with fine-grained, radiative-color labels is released to support realistic benchmarking. Experiments show MPCM-Net outperforms state-of-the-art methods in MIoU while maintaining real-time or near-real-time performance, highlighting its potential for operational PV forecasting and grid integration.

Abstract

Ground-based cloud image segmentation is a critical research domain for photovoltaic power forecasting. Current deep learning approaches primarily focus on encoder-decoder architectural refinements. However, existing methodologies exhibit several limitations:(1)they rely on dilated convolutions for multi-scale context extraction, lacking the partial feature effectiveness and interoperability of inter-channel;(2)attention-based feature enhancement implementations neglect accuracy-throughput balance; and (3)the decoder modifications fail to establish global interdependencies among hierarchical local features, limiting inference efficiency. To address these challenges, we propose MPCM-Net, a Multi-scale network that integrates Partial attention Convolutions with Mamba architectures to enhance segmentation accuracy and computational efficiency. Specifically, the encoder incorporates MPAC, which comprises:(1)a MPC block with ParCM and ParSM that enables global spatial interaction across multi-scale cloud formations, and (2)a MPA block combining ParAM and ParSM to extract discriminative features with reduced computational complexity. On the decoder side, a M2B is employed to mitigate contextual loss through a SSHD that maintains linear complexity while enabling deep feature aggregation across spatial and scale dimensions. As a key contribution to the community, we also introduce and release a dataset CSRC, which is a clear-label, fine-grained segmentation benchmark designed to overcome the critical limitations of existing public datasets. Extensive experiments on CSRC demonstrate the superior performance of MPCM-Net over state-of-the-art methods, achieving an optimal balance between segmentation accuracy and inference speed. The dataset and source code will be available at https://github.com/she1110/CSRC.

MPCM-Net: Multi-scale network integrates partial attention convolution with Mamba for ground-based cloud image segmentation

TL;DR

This work targets ground-based cloud image segmentation for PV power forecasting, addressing limitations in multi-scale context handling, attention efficiency, and global guidance in decoders. It introduces MPCM-Net, a two-path encoder-decoder framework that uses a Multi-scale Partial Attention Convolution (MPAC) encoder and a Multi-scale Mamba Decoding (M2B) path with a Spatial-Semantic Hybrid Domain (SSHD) to balance accuracy and inference speed. A new CSRC dataset with fine-grained, radiative-color labels is released to support realistic benchmarking. Experiments show MPCM-Net outperforms state-of-the-art methods in MIoU while maintaining real-time or near-real-time performance, highlighting its potential for operational PV forecasting and grid integration.

Abstract

Ground-based cloud image segmentation is a critical research domain for photovoltaic power forecasting. Current deep learning approaches primarily focus on encoder-decoder architectural refinements. However, existing methodologies exhibit several limitations:(1)they rely on dilated convolutions for multi-scale context extraction, lacking the partial feature effectiveness and interoperability of inter-channel;(2)attention-based feature enhancement implementations neglect accuracy-throughput balance; and (3)the decoder modifications fail to establish global interdependencies among hierarchical local features, limiting inference efficiency. To address these challenges, we propose MPCM-Net, a Multi-scale network that integrates Partial attention Convolutions with Mamba architectures to enhance segmentation accuracy and computational efficiency. Specifically, the encoder incorporates MPAC, which comprises:(1)a MPC block with ParCM and ParSM that enables global spatial interaction across multi-scale cloud formations, and (2)a MPA block combining ParAM and ParSM to extract discriminative features with reduced computational complexity. On the decoder side, a M2B is employed to mitigate contextual loss through a SSHD that maintains linear complexity while enabling deep feature aggregation across spatial and scale dimensions. As a key contribution to the community, we also introduce and release a dataset CSRC, which is a clear-label, fine-grained segmentation benchmark designed to overcome the critical limitations of existing public datasets. Extensive experiments on CSRC demonstrate the superior performance of MPCM-Net over state-of-the-art methods, achieving an optimal balance between segmentation accuracy and inference speed. The dataset and source code will be available at https://github.com/she1110/CSRC.

Paper Structure

This paper contains 26 sections, 19 equations, 11 figures, 8 tables, 1 algorithm.

Figures (11)

  • Figure 1: (a) illustrates multi-scale cloud. The red, yellow, and green blocks represent large, medium, and small-scale clouds clusters, respectively. In adjacent frames, the same cloud presents different scales. (b) illustrates the incomplete extraction of the boundary of clouds. Note that the results are from U-Net.
  • Figure 2: The structure of the proposed MPCM-Net, which is composed of two parts: Multi-scale Partial Attention Convolution Encoding and Multi-scale Mamba Decoding. The MPAC comprises two key components: the MPC block and the MPA block. The MMD incorporates an M2B block.
  • Figure 3: The structure of the proposed MPCM-Net, which is composed of two parts: Multi-scale Partial Attention Convolution Encoding and Multi-scale Mamba Decoding. The MPAC comprises two key components: the MPC block and the MPA block. The MMD incorporates an M2B block.
  • Figure 4: The overall architecture of the M2B module comprises three parallel branches that extract multi-scale features with varying receptive fields. These features are subsequently fused and processed through the SSHD to generate the final multi-scale global feature representation, denoted as $f_{s}$.
  • Figure 5: (a) The structure of the proposed SSHD is composed of two parts: $X_{C\times \left(1-R\right)}$ and $X_{C\times R}$. The $X_{C\times \left(1-R\right)}$ passes through the 2D-SSM, the $X_{C\times R}$ passes through the HA module, and obtains the SSHD output $X_{s}$. The bottom right side of the figure shows the notes of some modules.
  • ...and 6 more figures