Table of Contents
Fetching ...

The CHASM-SWPC Dataset for Coronal Hole Detection & Analysis

Cutter Beck, Evan Smith, Khagendra Katuwal, Rudra Kafle, Jacob Whitehill

TL;DR

This work addresses the need for high-quality ground-truth data to train coronal hole detectors by digitizing NOAA SWPC synoptic maps into precise CH masks using a semi-automatic pipeline that leverages the Segment Anything Model. It introduces CHASM and the CHASM-SWPC dataset, along with multi-wavelength CHRONNOS training, demonstrating that models trained on expert-labeled CHASM-SWPC data outperform those trained on SPoCA-CH pseudo-labels by substantial margins. The study also explores dataset variants (1407, 1111, 967) to balance data quantity and quality, and provides insights into inter-annotator agreement, boundary accuracy, and potential biases. Overall, CHASM enables high-fidelity coronal hole annotations and improves automated detection performance, with broad implications for space weather forecasting and solar physics research.

Abstract

Coronal holes (CHs) are low-activity, low-density solar coronal regions with open magnetic field lines (Cranmer 2009). In the extreme ultraviolet (EUV) spectrum, CHs appear as dark patches. Using daily hand-drawn maps from the Space Weather Prediction Center (SWPC), we developed a semi-automated pipeline to digitize the SWPC maps into binary segmentation masks. The resulting masks constitute the CHASM-SWPC dataset, a high-quality dataset to train and test automated CH detection models, which is released with this paper. We developed CHASM (Coronal Hole Annotation using Semi-automatic Methods), a software tool for semi-automatic annotation that enables users to rapidly and accurately annotate SWPC maps. The CHASM tool enabled us to annotate 1,111 CH masks, comprising the CHASM-SWPC-1111 dataset. We then trained multiple CHRONNOS (Coronal Hole RecOgnition Neural Network Over multi-Spectral-data) architecture (Jarolim et al. 2021) neural networks using the CHASM-SWPC dataset and compared their performance. Training the CHRONNOS neural network on these data achieved an accuracy of 0.9805, a True Skill Statistic (TSS) of 0.6807, and an intersection-over-union (IoU) of 0.5668, which is higher than the original pretrained CHRONNOS model Jarolim et al. (2021) achieved an accuracy of 0.9708, a TSS of 0.6749, and an IoU of 0.4805, when evaluated on the CHASM-SWPC-1111 test set.

The CHASM-SWPC Dataset for Coronal Hole Detection & Analysis

TL;DR

This work addresses the need for high-quality ground-truth data to train coronal hole detectors by digitizing NOAA SWPC synoptic maps into precise CH masks using a semi-automatic pipeline that leverages the Segment Anything Model. It introduces CHASM and the CHASM-SWPC dataset, along with multi-wavelength CHRONNOS training, demonstrating that models trained on expert-labeled CHASM-SWPC data outperform those trained on SPoCA-CH pseudo-labels by substantial margins. The study also explores dataset variants (1407, 1111, 967) to balance data quantity and quality, and provides insights into inter-annotator agreement, boundary accuracy, and potential biases. Overall, CHASM enables high-fidelity coronal hole annotations and improves automated detection performance, with broad implications for space weather forecasting and solar physics research.

Abstract

Coronal holes (CHs) are low-activity, low-density solar coronal regions with open magnetic field lines (Cranmer 2009). In the extreme ultraviolet (EUV) spectrum, CHs appear as dark patches. Using daily hand-drawn maps from the Space Weather Prediction Center (SWPC), we developed a semi-automated pipeline to digitize the SWPC maps into binary segmentation masks. The resulting masks constitute the CHASM-SWPC dataset, a high-quality dataset to train and test automated CH detection models, which is released with this paper. We developed CHASM (Coronal Hole Annotation using Semi-automatic Methods), a software tool for semi-automatic annotation that enables users to rapidly and accurately annotate SWPC maps. The CHASM tool enabled us to annotate 1,111 CH masks, comprising the CHASM-SWPC-1111 dataset. We then trained multiple CHRONNOS (Coronal Hole RecOgnition Neural Network Over multi-Spectral-data) architecture (Jarolim et al. 2021) neural networks using the CHASM-SWPC dataset and compared their performance. Training the CHRONNOS neural network on these data achieved an accuracy of 0.9805, a True Skill Statistic (TSS) of 0.6807, and an intersection-over-union (IoU) of 0.5668, which is higher than the original pretrained CHRONNOS model Jarolim et al. (2021) achieved an accuracy of 0.9708, a TSS of 0.6749, and an IoU of 0.4805, when evaluated on the CHASM-SWPC-1111 test set.

Paper Structure

This paper contains 23 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: An EUV image of the Sun with a dark coronal hole in the center with intensity under 200 DN (digital number) (left) and the corresponding magnetogram (right) from 2019-01-01. Source: virtual_solar_observatory with SunPy sunpy.
  • Figure 2: CH segmentation pipeline example for an observation made on 2019-01-01. From left to right: SWPC synoptic map with hand-drawn CHs; SAM-generated segmentation masks overlaid on the SWPC map; Cropped SWPC solar disk extracted using Hough circle detection; CHASM binary mask with CH pixels in black and the Hough circle in blue.
  • Figure 3: The CHASM tool to digitize SWPC maps.
  • Figure 4: Comparison of the SWPC synoptic drawings (row 1), corresponding 193Å imagery (row 2), SPoCA-CH labels (row 3), CHASM-SWPC-1111 labels (row 4), ground truth segmentations from SPoCA-CH (row 5) jarolim_multi-channel_2021, CHRONNOS output when trained on CHASM-SWPC-1111 (row 6), and the pre-trained CHRONNOS trained on SPoCA-CH (row 6) jarolim_multi-channel_2021. 2016-12-07 was flagged during the CHASM annotation as "bad" and thus not used for training.
  • Figure 5: Histogram of values from predictions of CHRONNOS model trained on CHASM-SWPC-1111 from 2017-01-01 through 2017-01-10 in Full Disk and Central Band.