Table of Contents
Fetching ...

Automatic detection of CMEs using synthetically-trained Mask R-CNN

Francisco A. Iglesias, Diego G. Lloveras, Florencia L. Cisterna, Hebe Cremades, Mariano Sanchez Toledo, Fernando M. López, Yasmin Machuca, Franco Manini, Andrés Asensio Ramos

TL;DR

This work tackles automatic segmentation of CMEs in coronagraph images by synthesizing a large CME dataset that combines real background coronal images with ray-traced GCS-based brightness and training a Mask R-CNN to perform instance segmentation of the CME outer envelope without using kinematic information. The model achieves a median $IoU$ of $0.98$ on synthetic validation data and $0.77$ on 115 real CME observations, demonstrating that synthetic-data training can yield robust, topologically connected CME masks and enable multi-CME discrimination across instruments. A CME-tracking step selects the most plausible mask per image in time series, mitigating false positives and improving consistency. The approach offers a scalable path to automated CME catalogs and onboard capable detection, with potential enhancements by incorporating temporal information and more realistic CME/coronal models.

Abstract

Coronal mass ejections (CMEs) are a major driver of space weather. To assess CME geoeffectiveness, among other scientific goals, it is necessary to reliably identify and characterize their morphology and kinematics in coronagraph images. Current methods of CME identification are either subjected to human biases or perform a poor identification due to deficiencies in the automatic detection. In this approach, we have trained the deep convolutional neural model Mask R-CNN to automatically segment the outer envelope of one or multiple CMEs present in a single difference coronagraph image. The empirical training dataset is composed of 10^5 synthetic coronagraph images with known pixel-level CME segmentation masks. It is obtained by combining quiet coronagraph observations, with synthetic white-light CMEs produced using the GCS geometric model and ray-tracing technique. We found that our model-based trained Mask R-CNN infers segmentation masks that are smooth and topologically connected. While the inferred masks are not representative of the detailed outer envelope of complex CMEs, the neural model can better differentiate a CME from other radially moving background/foreground features, segment multiple simultaneous CMEs that are close to each other, and work with images from different instruments. This is accomplished without relying on kinematic information, i.e. only the included in the single input difference image. We obtain a median IoU=0.98 for 1.6*10^4 synthetic validation images, and IoU=0.77 when compared with two independent manual segmentations of 115 observations acquired by the COR2-A, COR2-B and LASCO C2 coronagraphs. The methodology presented in this work can be used with other CME models to produce more realistic synthetic brightness images while preserving desired morphological features, and obtain more robust and/or tailored segmentations.

Automatic detection of CMEs using synthetically-trained Mask R-CNN

TL;DR

This work tackles automatic segmentation of CMEs in coronagraph images by synthesizing a large CME dataset that combines real background coronal images with ray-traced GCS-based brightness and training a Mask R-CNN to perform instance segmentation of the CME outer envelope without using kinematic information. The model achieves a median of on synthetic validation data and on 115 real CME observations, demonstrating that synthetic-data training can yield robust, topologically connected CME masks and enable multi-CME discrimination across instruments. A CME-tracking step selects the most plausible mask per image in time series, mitigating false positives and improving consistency. The approach offers a scalable path to automated CME catalogs and onboard capable detection, with potential enhancements by incorporating temporal information and more realistic CME/coronal models.

Abstract

Coronal mass ejections (CMEs) are a major driver of space weather. To assess CME geoeffectiveness, among other scientific goals, it is necessary to reliably identify and characterize their morphology and kinematics in coronagraph images. Current methods of CME identification are either subjected to human biases or perform a poor identification due to deficiencies in the automatic detection. In this approach, we have trained the deep convolutional neural model Mask R-CNN to automatically segment the outer envelope of one or multiple CMEs present in a single difference coronagraph image. The empirical training dataset is composed of 10^5 synthetic coronagraph images with known pixel-level CME segmentation masks. It is obtained by combining quiet coronagraph observations, with synthetic white-light CMEs produced using the GCS geometric model and ray-tracing technique. We found that our model-based trained Mask R-CNN infers segmentation masks that are smooth and topologically connected. While the inferred masks are not representative of the detailed outer envelope of complex CMEs, the neural model can better differentiate a CME from other radially moving background/foreground features, segment multiple simultaneous CMEs that are close to each other, and work with images from different instruments. This is accomplished without relying on kinematic information, i.e. only the included in the single input difference image. We obtain a median IoU=0.98 for 1.6*10^4 synthetic validation images, and IoU=0.77 when compared with two independent manual segmentations of 115 observations acquired by the COR2-A, COR2-B and LASCO C2 coronagraphs. The methodology presented in this work can be used with other CME models to produce more realistic synthetic brightness images while preserving desired morphological features, and obtain more robust and/or tailored segmentations.

Paper Structure

This paper contains 11 sections, 2 equations, 17 figures.

Figures (17)

  • Figure 1: ML-based segmentation of CMEs. We summarize current approaches found in the literature, including supervised classification followed by unsupervised segmentation (A, blue arrows), supervised classification followed by supervised segmentation using manually labeled data (C, red arrows), and supervised 3D segmentation using manually labeled data (D, violet arrows). The supervised instance segmentation approach used in this work (B, green arrows) employs synthetic data instead of manual labeling. We sketch a simplified version of the data flow between blocks representing input/output (orange), different NNs (blue), and other processing algorithms (grey). The designation of each approach and the name of the neural model used are annotated in bold. See the text for extra details.
  • Figure 2: The GCS geometric model, including its face-on view (panel a) and a 3D perspective showing the source region on the solar surface (panel b). The 6 model parameters are the source region Stonyhurst longitude ($\phi$) and heliographic latitude ($\theta$), tilt angle with respect to the local parallel ($\gamma$), height of the apex ($h$), aspect ratio ($\kappa$), and half angular separation of the legs ($\alpha$). Adapted from Thernisien-etal2009.
  • Figure 3: Generation of the empirical synthetic dataset. Each dataset element is composed of a differential synthetic coronograph image and its corresponding CME and occulter binary segmentation masks (blue boxes). These are produced using a real observed coronal background (with no CME) and synthetic brightness images obtained from a ray-tracing simulation based on a random GCS model (green boxes). The red boxes show intermediate steps, see the text for extra details.
  • Figure 4: Architecture of the Mask R-CNN model he2017 used for instance segmentation of CMEs in coronagraphic images. The backbone is a ResNet-50 with initial weights trained on the COCO dataset. We use the object classes: Background, CME and Occulter.
  • Figure 5: Mask R-CNN model training on synthetic coronagraphic images. We show the training loss (solid line) and mean validation IoU (dashed line, refer to the right vertical axis) as a function of the number of training images. The gray band represents the 25th-75th percentile of the $IoU$, which is computed for the validation set. Black vertical segments on the horizontal axis mark each training epoch.
  • ...and 12 more figures