Table of Contents
Fetching ...

Vision-Based CNN Prediction of Sunspot Numbers from SDO/HMI Images

Fabian C. Quintero-Pareja, Diederik A. Montano-Burbano, Santiago Quintero-Pareja, D. Sierra-Porta

TL;DR

The paper tackles the problem of estimating daily sunspot numbers directly from full-disk SDO/HMI continuum images, addressing the limitations of manual counting and handcrafted features. It trains a CNN-based regression model on data from 2011–2024, pairing daily HMI images with SILSO Version 2.0 sunspot counts using a chronological 80/20 split and the Huber loss. The model achieves strong results on the test set, with $R^2 = 0.986$ and $RMSE = 6.25$, and exhibits high correlation with the reference series; interpretability analyses using Grad-CAM and Integrated Gradients confirm focus on sunspot regions and related textures. The work demonstrates a scalable, end-to-end vision-based approach for real-time estimation of heliophysical indices, with potential to enhance operational space weather monitoring and reduce reliance on manual sunspot counting.

Abstract

Sunspot numbers constitute the longest and most widely used record of solar activity, with direct implications for space weather forecasting and heliophysical research. Traditional sunspot counting relies on visual inspection or algorithmic feature detection, both of which are limited by subjectivity, image quality, and methodological inconsistencies. Recent advances in deep learning, particularly convolutional neural networks (CNNs), enable the direct use of solar imagery for automated prediction tasks, reducing reliance on manual feature engineering. In this work, we present a supervised vision-based regression framework to estimate daily sunspot numbers from full-disk continuum images acquired by the Helioseismic and Magnetic Imager (HMI) onboard NASA Solar Dynamics Observatory (SDO). Images from 2011-2024 were paired with daily sunspot numbers from the SILSO Version 2.0 dataset of the Royal Observatory of Belgium. After preprocessing and augmentation, a CNN was trained to predict scalar sunspot counts directly from pixel data. The proposed model achieved strong predictive performance, with R2 = 0.986 and RMSE = 6.25 on the test set, indicating close agreement with SILSO reference values. Comparative evaluation against prior studies shows that our approach performs competitively with, and in several cases outperforms, statistical and hybrid machine learning methods, while offering the novel advantage of bypassing explicit detection and manual feature extraction. Interpretability analyses using Grad-CAM and Integrated Gradients confirmed that the network consistently attends to sunspot regions when forming predictions. These results highlight the potential of deep vision-based approaches for operational solar monitoring, providing a scalable and automated pathway for real-time estimation of classical heliophysical indices

Vision-Based CNN Prediction of Sunspot Numbers from SDO/HMI Images

TL;DR

The paper tackles the problem of estimating daily sunspot numbers directly from full-disk SDO/HMI continuum images, addressing the limitations of manual counting and handcrafted features. It trains a CNN-based regression model on data from 2011–2024, pairing daily HMI images with SILSO Version 2.0 sunspot counts using a chronological 80/20 split and the Huber loss. The model achieves strong results on the test set, with and , and exhibits high correlation with the reference series; interpretability analyses using Grad-CAM and Integrated Gradients confirm focus on sunspot regions and related textures. The work demonstrates a scalable, end-to-end vision-based approach for real-time estimation of heliophysical indices, with potential to enhance operational space weather monitoring and reduce reliance on manual sunspot counting.

Abstract

Sunspot numbers constitute the longest and most widely used record of solar activity, with direct implications for space weather forecasting and heliophysical research. Traditional sunspot counting relies on visual inspection or algorithmic feature detection, both of which are limited by subjectivity, image quality, and methodological inconsistencies. Recent advances in deep learning, particularly convolutional neural networks (CNNs), enable the direct use of solar imagery for automated prediction tasks, reducing reliance on manual feature engineering. In this work, we present a supervised vision-based regression framework to estimate daily sunspot numbers from full-disk continuum images acquired by the Helioseismic and Magnetic Imager (HMI) onboard NASA Solar Dynamics Observatory (SDO). Images from 2011-2024 were paired with daily sunspot numbers from the SILSO Version 2.0 dataset of the Royal Observatory of Belgium. After preprocessing and augmentation, a CNN was trained to predict scalar sunspot counts directly from pixel data. The proposed model achieved strong predictive performance, with R2 = 0.986 and RMSE = 6.25 on the test set, indicating close agreement with SILSO reference values. Comparative evaluation against prior studies shows that our approach performs competitively with, and in several cases outperforms, statistical and hybrid machine learning methods, while offering the novel advantage of bypassing explicit detection and manual feature extraction. Interpretability analyses using Grad-CAM and Integrated Gradients confirmed that the network consistently attends to sunspot regions when forming predictions. These results highlight the potential of deep vision-based approaches for operational solar monitoring, providing a scalable and automated pathway for real-time estimation of classical heliophysical indices

Paper Structure

This paper contains 9 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Daily sunspot number time series from SILSO (Version 2.0) between 2011 and 2024, including the smoothed solar cycle trend (https://www.sidc.be/SILSO/datafiles). This dataset served as the reference ground truth for model training and evaluation.
  • Figure 2: Representative continuum intensity images from the Helioseismic and Magnetic Imager (HMI) aboard SDO (https://soho.nascom.nasa.gov/data/REPROCESSING/Completed/). The images illustrate solar disk evolution across different phases of solar activity (2011–2024). These full-disk images were preprocessed and paired with SILSO daily sunspot numbers for supervised regression.
  • Figure 3: Predicted versus observed daily sunspot numbers. Solid line: identity; dashed line: linear fit. The points cluster around the identity line for both splits, indicating strong agreement.
  • Figure 4: Residuals (Predicted $-$ True) versus predicted value. Residuals are centered near zero with no strong heteroscedastic pattern; slight underestimation appears at the highest activity levels.
  • Figure 5: Bland–Altman plots showing agreement between predictions and ground truth. Dashed line: mean difference (bias); dotted lines: limits of agreement ($\pm1.96\sigma$). Bias is close to zero; wider dispersion occurs for high sunspot activity.
  • ...and 3 more figures