The CHASM-SWPC Dataset for Coronal Hole Detection & Analysis
Cutter Beck, Evan Smith, Khagendra Katuwal, Rudra Kafle, Jacob Whitehill
TL;DR
This work addresses the need for high-quality ground-truth data to train coronal hole detectors by digitizing NOAA SWPC synoptic maps into precise CH masks using a semi-automatic pipeline that leverages the Segment Anything Model. It introduces CHASM and the CHASM-SWPC dataset, along with multi-wavelength CHRONNOS training, demonstrating that models trained on expert-labeled CHASM-SWPC data outperform those trained on SPoCA-CH pseudo-labels by substantial margins. The study also explores dataset variants (1407, 1111, 967) to balance data quantity and quality, and provides insights into inter-annotator agreement, boundary accuracy, and potential biases. Overall, CHASM enables high-fidelity coronal hole annotations and improves automated detection performance, with broad implications for space weather forecasting and solar physics research.
Abstract
Coronal holes (CHs) are low-activity, low-density solar coronal regions with open magnetic field lines (Cranmer 2009). In the extreme ultraviolet (EUV) spectrum, CHs appear as dark patches. Using daily hand-drawn maps from the Space Weather Prediction Center (SWPC), we developed a semi-automated pipeline to digitize the SWPC maps into binary segmentation masks. The resulting masks constitute the CHASM-SWPC dataset, a high-quality dataset to train and test automated CH detection models, which is released with this paper. We developed CHASM (Coronal Hole Annotation using Semi-automatic Methods), a software tool for semi-automatic annotation that enables users to rapidly and accurately annotate SWPC maps. The CHASM tool enabled us to annotate 1,111 CH masks, comprising the CHASM-SWPC-1111 dataset. We then trained multiple CHRONNOS (Coronal Hole RecOgnition Neural Network Over multi-Spectral-data) architecture (Jarolim et al. 2021) neural networks using the CHASM-SWPC dataset and compared their performance. Training the CHRONNOS neural network on these data achieved an accuracy of 0.9805, a True Skill Statistic (TSS) of 0.6807, and an intersection-over-union (IoU) of 0.5668, which is higher than the original pretrained CHRONNOS model Jarolim et al. (2021) achieved an accuracy of 0.9708, a TSS of 0.6749, and an IoU of 0.4805, when evaluated on the CHASM-SWPC-1111 test set.
