Solar synthetic imaging: Introducing denoising diffusion probabilistic models on SDO/AIA data
Francesco P. Ramunno, S. Hackstein, V. Kinakh, M. Drozdova, G. Quetant, A. Csillaghy, S. Voloshynovskiy
TL;DR
The paper tackles data scarcity in solar flare forecasting by leveraging denoising diffusion probabilistic models (DDPMs) to generate labeled synthetic full-disc solar images from SDO/AIA 171Å data. It demonstrates three conditioning strategies (discrete GOES classes, continuous X-ray values, and ceVAE embeddings) and evaluates with cluster metrics, Fréchet Inception Distance, and macro F1, finding that discrete GOES conditioning yields the best balance between realism and discriminability. The study shows that synthetic data can improve classifier performance on underrepresented flare classes and enhance 24-hour flare-prediction metrics, highlighting the practical role of diffusion-based data augmentation in heliophysics. Overall, the work establishes DDPMs as a promising tool for generating physically plausible solar imagery and aiding downstream forecasting and analysis tasks, with future work focusing on larger image sizes, broader data, and physics-grounded validation.
Abstract
Given the rarity of significant solar flares compared to smaller ones, training effective machine learning models for solar activity forecasting is challenging due to insufficient data. This study proposes using generative deep learning models, specifically a Denoising Diffusion Probabilistic Model (DDPM), to create synthetic images of solar phenomena, including flares of varying intensities. By employing a dataset from the AIA instrument aboard the SDO spacecraft, focusing on the 171 Å band that captures various solar activities, and classifying images with GOES X-ray measurements based on flare intensity, we aim to address the data scarcity issue. The DDPM's performance is evaluated using cluster metrics, Frechet Inception Distance (FID), and F1-score, showcasing promising results in generating realistic solar imagery. We conduct two experiments: one to train a supervised classifier for event identification and another for basic flare prediction, demonstrating the value of synthetic data in managing imbalanced datasets. This research underscores the potential of DDPMs in solar data analysis and forecasting, suggesting further exploration into their capabilities for solar flare prediction and application in other deep learning and physical tasks.
