Table of Contents
Fetching ...

Advances and Challenges in Solar Flare Prediction: A Review

Mingfu Shao, Suo Liu, Haiqing Xu, Peng Jia, Hui Wang, Liyue Tong, Yang Bai, Chen Yang, Yuyang Li, Nan Li, Jiaben Lin

TL;DR

This review surveys solar flare forecasting from data and method perspectives, tracing the shift from physics-based to data-driven and multimodal large-model approaches. It catalogs data sources (GOES X-ray flux, SHARP, HMI/FM G, ASO-S), public datasets (e.g., Boucheron), and evaluation metrics (notably TSS) while detailing platform-level online forecasts (DeepSun, SolarFlareNet, MViT) and benchmarks (CCMC Scoreboard). It highlights that while deep learning and MLMs improve predictive performance, practical operational reliability is hampered by limited cross-cycle data, data imbalance, and insufficient physical interpretability. The authors advocate for authoritative, multi-cycle benchmarks, physically grounded yet scalable models, end-to-end online learning, and multi-task large-model frameworks to advance real-time, robust space-weather forecasting.

Abstract

Solar flares, as one of the most prominent manifestations of solar activity, have a profound impact on both the Earth's space environment and human activities. As a result, accurate solar flare prediction has emerged as a central topic in space weather research. In recent years, substantial progress has been made in the field of solar flare forecasting, driven by the rapid advancements in space observation technology and the continuous improvement of data processing capabilities. This paper presents a comprehensive review of the current state of research in this area, with a particular focus on tracing the evolution of data-driven approaches -- which have progressed from early statistical learning techniques to more sophisticated machine learning and deep learning paradigms, and most recently, to the emergence of Multimodal Large Models (MLMs). Furthermore, this study examines the realistic performance of existing flare forecasting platforms, elucidating their limitations in operational space weather applications and thereby offering a practical reference for future advancements in technological optimization and system design.

Advances and Challenges in Solar Flare Prediction: A Review

TL;DR

This review surveys solar flare forecasting from data and method perspectives, tracing the shift from physics-based to data-driven and multimodal large-model approaches. It catalogs data sources (GOES X-ray flux, SHARP, HMI/FM G, ASO-S), public datasets (e.g., Boucheron), and evaluation metrics (notably TSS) while detailing platform-level online forecasts (DeepSun, SolarFlareNet, MViT) and benchmarks (CCMC Scoreboard). It highlights that while deep learning and MLMs improve predictive performance, practical operational reliability is hampered by limited cross-cycle data, data imbalance, and insufficient physical interpretability. The authors advocate for authoritative, multi-cycle benchmarks, physically grounded yet scalable models, end-to-end online learning, and multi-task large-model frameworks to advance real-time, robust space-weather forecasting.

Abstract

Solar flares, as one of the most prominent manifestations of solar activity, have a profound impact on both the Earth's space environment and human activities. As a result, accurate solar flare prediction has emerged as a central topic in space weather research. In recent years, substantial progress has been made in the field of solar flare forecasting, driven by the rapid advancements in space observation technology and the continuous improvement of data processing capabilities. This paper presents a comprehensive review of the current state of research in this area, with a particular focus on tracing the evolution of data-driven approaches -- which have progressed from early statistical learning techniques to more sophisticated machine learning and deep learning paradigms, and most recently, to the emergence of Multimodal Large Models (MLMs). Furthermore, this study examines the realistic performance of existing flare forecasting platforms, elucidating their limitations in operational space weather applications and thereby offering a practical reference for future advancements in technological optimization and system design.

Paper Structure

This paper contains 21 sections, 7 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: A comparison of the total energies released by flares across different scales with the energy partitioning in a major event. The upper segment depicts the distribution of energy forms in a large flare, whereas the lower segment contrasts the relative magnitudes of total energy for different flare classes huang2024short.
  • Figure 2: GOES X-ray flux measured at 1-minute intervals over a one-day period. The horizontal axis denotes Coordinated Universal Time (UTC), while the vertical axis shows X-ray flux in watts per square meter (W $\cdot$ m⁻²) on a logarithmic scale. The right-hand side indicates the GOES soft X-ray flare classification scheme (A, B, C, M, X), based on peak flux in the 1-8 Å band. The long channel (1-8 Å; 0.1-0.8 nm) of the GOES-18 and GOES-19 X-Ray Sensor (XRS) records the flux of soft X-rays, whereas the short channel (0.5--4 Å; 0.05--0.4 nm) measures the flux of hard X-rays. Data Source: National Oceanic and Atmospheric Administration (NOAA), SWPC.
  • Figure 3: Comparison of different Solar Flare Prediction Methods within 24 Hours. Model: Refers to the foundation model for various methods. Type: AR.C uses central active region data; AR.F includes data from all active regions; FD represents full-disk data. Numbers (1, 2, 3) indicate data dimensionality: 1D textual parameter, 2D image, or 3D time-sequence images. Input: Indicates the type of data input, with "single" representing a single image or parameter, and "series" representing a time sequence of images or parameters. Level: Refers to the class of flares predicted by the model. Table: Corresponding table from the original paper. All the evaluation metrics were extracted from the original paper. '---' indicates unavailable data, '*' indicates data that were not explicitly provided in the original articles but were inferred through our analysis.