Table of Contents
Fetching ...

Solar synthetic imaging: Introducing denoising diffusion probabilistic models on SDO/AIA data

Francesco P. Ramunno, S. Hackstein, V. Kinakh, M. Drozdova, G. Quetant, A. Csillaghy, S. Voloshynovskiy

TL;DR

The paper tackles data scarcity in solar flare forecasting by leveraging denoising diffusion probabilistic models (DDPMs) to generate labeled synthetic full-disc solar images from SDO/AIA 171Å data. It demonstrates three conditioning strategies (discrete GOES classes, continuous X-ray values, and ceVAE embeddings) and evaluates with cluster metrics, Fréchet Inception Distance, and macro F1, finding that discrete GOES conditioning yields the best balance between realism and discriminability. The study shows that synthetic data can improve classifier performance on underrepresented flare classes and enhance 24-hour flare-prediction metrics, highlighting the practical role of diffusion-based data augmentation in heliophysics. Overall, the work establishes DDPMs as a promising tool for generating physically plausible solar imagery and aiding downstream forecasting and analysis tasks, with future work focusing on larger image sizes, broader data, and physics-grounded validation.

Abstract

Given the rarity of significant solar flares compared to smaller ones, training effective machine learning models for solar activity forecasting is challenging due to insufficient data. This study proposes using generative deep learning models, specifically a Denoising Diffusion Probabilistic Model (DDPM), to create synthetic images of solar phenomena, including flares of varying intensities. By employing a dataset from the AIA instrument aboard the SDO spacecraft, focusing on the 171 Å band that captures various solar activities, and classifying images with GOES X-ray measurements based on flare intensity, we aim to address the data scarcity issue. The DDPM's performance is evaluated using cluster metrics, Frechet Inception Distance (FID), and F1-score, showcasing promising results in generating realistic solar imagery. We conduct two experiments: one to train a supervised classifier for event identification and another for basic flare prediction, demonstrating the value of synthetic data in managing imbalanced datasets. This research underscores the potential of DDPMs in solar data analysis and forecasting, suggesting further exploration into their capabilities for solar flare prediction and application in other deep learning and physical tasks.

Solar synthetic imaging: Introducing denoising diffusion probabilistic models on SDO/AIA data

TL;DR

The paper tackles data scarcity in solar flare forecasting by leveraging denoising diffusion probabilistic models (DDPMs) to generate labeled synthetic full-disc solar images from SDO/AIA 171Å data. It demonstrates three conditioning strategies (discrete GOES classes, continuous X-ray values, and ceVAE embeddings) and evaluates with cluster metrics, Fréchet Inception Distance, and macro F1, finding that discrete GOES conditioning yields the best balance between realism and discriminability. The study shows that synthetic data can improve classifier performance on underrepresented flare classes and enhance 24-hour flare-prediction metrics, highlighting the practical role of diffusion-based data augmentation in heliophysics. Overall, the work establishes DDPMs as a promising tool for generating physically plausible solar imagery and aiding downstream forecasting and analysis tasks, with future work focusing on larger image sizes, broader data, and physics-grounded validation.

Abstract

Given the rarity of significant solar flares compared to smaller ones, training effective machine learning models for solar activity forecasting is challenging due to insufficient data. This study proposes using generative deep learning models, specifically a Denoising Diffusion Probabilistic Model (DDPM), to create synthetic images of solar phenomena, including flares of varying intensities. By employing a dataset from the AIA instrument aboard the SDO spacecraft, focusing on the 171 Å band that captures various solar activities, and classifying images with GOES X-ray measurements based on flare intensity, we aim to address the data scarcity issue. The DDPM's performance is evaluated using cluster metrics, Frechet Inception Distance (FID), and F1-score, showcasing promising results in generating realistic solar imagery. We conduct two experiments: one to train a supervised classifier for event identification and another for basic flare prediction, demonstrating the value of synthetic data in managing imbalanced datasets. This research underscores the potential of DDPMs in solar data analysis and forecasting, suggesting further exploration into their capabilities for solar flare prediction and application in other deep learning and physical tasks.
Paper Structure (22 sections, 8 equations, 14 figures, 5 tables)

This paper contains 22 sections, 8 equations, 14 figures, 5 tables.

Figures (14)

  • Figure 1: Histogram distribution of the labelled dataset with the discrete GOES labels: A, B, C, M and X.
  • Figure 2: Sketch of the network trained with the discrete labels and the ceVAE embeddings to guide the diffusion. In the concatenation process l represents the discrete label and z the features of the ceVAE latent space.
  • Figure 3: t-SNE dimensionality-reduction technique applied to various latent spaces to determine which is most appropriate for cluster metrics. Figure a) shows the t-SNE of CLIP latent space. Figure b) shows the t-SNE of the latent space of a classifier. Figure c) shows the latent space of a pretrained ceVAE.
  • Figure 4: Standard-deviation maps, comparing true images (left) and generated images (right) for each class. Panel a) represents the A class, panel b) the B class, panel c) the C class, panel d) the M class and panel e) the X class.
  • Figure 5: Batch of 25 generated images. The first two rows are generated with the discrete label model, the third and the fourth row with the X-ray model and the last row with the ceVAE embedding model. The first column shows the A class, the second column the B class, the third column the C class, the fourth column the M class and the fifth column the X class.
  • ...and 9 more figures