Table of Contents
Fetching ...

M3S-Net: Multimodal Feature Fusion Network Based on Multi-scale Data for Ultra-short-term PV Power Forecasting

Penghui Niu, Taotao Cai, Suqi Zhang, Junhua Gu, Ping Zhang, Qiqi Liu, Jianxin Li

TL;DR

M3S-Net is proposed, a novel multimodal feature fusion network based on multi-scale data for ultra-short-term PV power forecasting that incorporates a cross-modal Mamba interaction module featuring a novel dynamic C-matrix swapping mechanism, enabling deep structural coupling with linear computational complexity.

Abstract

The inherent intermittency and high-frequency variability of solar irradiance, particularly during rapid cloud advection, present significant stability challenges to high-penetration photovoltaic grids. Although multimodal forecasting has emerged as a viable mitigation strategy, existing architectures predominantly rely on shallow feature concatenation and binary cloud segmentation, thereby failing to capture the fine-grained optical features of clouds and the complex spatiotemporal coupling between visual and meteorological modalities. To bridge this gap, this paper proposes M3S-Net, a novel multimodal feature fusion network based on multi-scale data for ultra-short-term PV power forecasting. First, a multi-scale partial channel selection network leverages partial convolutions to explicitly isolate the boundary features of optically thin clouds, effectively transcending the precision limitations of coarse-grained binary masking. Second, a multi-scale sequence to image analysis network employs Fast Fourier Transform (FFT)-based time-frequency representation to disentangle the complex periodicity of meteorological data across varying time horizons. Crucially, the model incorporates a cross-modal Mamba interaction module featuring a novel dynamic C-matrix swapping mechanism. By exchanging state-space parameters between visual and temporal streams, this design conditions the state evolution of one modality on the context of the other, enabling deep structural coupling with linear computational complexity, thus overcoming the limitations of shallow concatenation. Experimental validation on the newly constructed fine-grained PV power dataset demonstrates that M3S-Net achieves a mean absolute error reduction of 6.2% in 10-minute forecasts compared to state-of-the-art baselines. The dataset and source code will be available at https://github.com/she1110/FGPD.

M3S-Net: Multimodal Feature Fusion Network Based on Multi-scale Data for Ultra-short-term PV Power Forecasting

TL;DR

M3S-Net is proposed, a novel multimodal feature fusion network based on multi-scale data for ultra-short-term PV power forecasting that incorporates a cross-modal Mamba interaction module featuring a novel dynamic C-matrix swapping mechanism, enabling deep structural coupling with linear computational complexity.

Abstract

The inherent intermittency and high-frequency variability of solar irradiance, particularly during rapid cloud advection, present significant stability challenges to high-penetration photovoltaic grids. Although multimodal forecasting has emerged as a viable mitigation strategy, existing architectures predominantly rely on shallow feature concatenation and binary cloud segmentation, thereby failing to capture the fine-grained optical features of clouds and the complex spatiotemporal coupling between visual and meteorological modalities. To bridge this gap, this paper proposes M3S-Net, a novel multimodal feature fusion network based on multi-scale data for ultra-short-term PV power forecasting. First, a multi-scale partial channel selection network leverages partial convolutions to explicitly isolate the boundary features of optically thin clouds, effectively transcending the precision limitations of coarse-grained binary masking. Second, a multi-scale sequence to image analysis network employs Fast Fourier Transform (FFT)-based time-frequency representation to disentangle the complex periodicity of meteorological data across varying time horizons. Crucially, the model incorporates a cross-modal Mamba interaction module featuring a novel dynamic C-matrix swapping mechanism. By exchanging state-space parameters between visual and temporal streams, this design conditions the state evolution of one modality on the context of the other, enabling deep structural coupling with linear computational complexity, thus overcoming the limitations of shallow concatenation. Experimental validation on the newly constructed fine-grained PV power dataset demonstrates that M3S-Net achieves a mean absolute error reduction of 6.2% in 10-minute forecasts compared to state-of-the-art baselines. The dataset and source code will be available at https://github.com/she1110/FGPD.
Paper Structure (29 sections, 17 equations, 16 figures, 7 tables)

This paper contains 29 sections, 17 equations, 16 figures, 7 tables.

Figures (16)

  • Figure 1: The overall diagram of the proposed M3S-Net. The framework accomplishes the prediction task through three core components: 1. Fine-grained visual extraction branch; 2. Multi-scale temporal imaging branch; 3. Cross-modal mamba fusion branch.
  • Figure 2: The structure of the proposed MPCS-Net, which is an Encoder-Decoder architecture. The spatial-channel selection mechanism is embedded within the encoder to form the MSPC and MSPA.
  • Figure 3: The structure of the proposed MPCS and MSPA, which is concluding the (a) SCSM. The core modules of the SCSM are (b) CSIA and (c) CE.
  • Figure 4: The flowchart of the multi-scale representation.
  • Figure 5: The flowchart of the CSA to capture nonlinear interdependencies among variables on the coarsest scale $X_{M}$.
  • ...and 11 more figures