Table of Contents
Fetching ...

Automated Identification and Segmentation of Hi Sources in CRAFTS Using Deep Learning Method

Zihao Song, Huaxi Chen, Donghui Quan, Di Li, Yinghui Zheng, Shulei Ni, Yunchuan Chen, Yun Zheng

TL;DR

A machine learning-based method for extracting H i sources from the 3D spectral data obtained from the Commensal Radio Astronomy FAST Survey (CRAFTS), which utilizes the advanced 3D-Unet segmentation architecture and employs an elongated convolution kernel to effectively capture the intricate structures of H i sources.

Abstract

Identifying neutral hydrogen (\hi) galaxies from observational data is a significant challenge in \hi\ galaxy surveys. With the advancement of observational technology, especially with the advent of large-scale telescope projects such as FAST and SKA, the significant increase in data volume presents new challenges for the efficiency and accuracy of data processing.To address this challenge, in this study, we present a machine learning-based method for extracting \hi\ sources from the three-dimensional (3D) spectral data obtained from the Commensal Radio Astronomy FAST Survey (CRAFTS). We have carefully assembled a specialized dataset, HISF, rich in \hi\ sources, specifically designed to enhance the detection process. Our model, Unet-LK, utilizes the advanced 3D-Unet segmentation architecture and employs an elongated convolution kernel to effectively capture the intricate structures of \hi\ sources. This strategy ensures a reliable identification and segmentation of \hi\ sources, achieving notable performance metrics with a recall rate of 91.6\% and an accuracy of 95.7\%. These results substantiate the robustness of our dataset and the effectiveness of our proposed network architecture in the precise identification of \hi\ sources. Our code and dataset is publicly available at \url{https://github.com/fishszh/HISF}.

Automated Identification and Segmentation of Hi Sources in CRAFTS Using Deep Learning Method

TL;DR

A machine learning-based method for extracting H i sources from the 3D spectral data obtained from the Commensal Radio Astronomy FAST Survey (CRAFTS), which utilizes the advanced 3D-Unet segmentation architecture and employs an elongated convolution kernel to effectively capture the intricate structures of H i sources.

Abstract

Identifying neutral hydrogen (\hi) galaxies from observational data is a significant challenge in \hi\ galaxy surveys. With the advancement of observational technology, especially with the advent of large-scale telescope projects such as FAST and SKA, the significant increase in data volume presents new challenges for the efficiency and accuracy of data processing.To address this challenge, in this study, we present a machine learning-based method for extracting \hi\ sources from the three-dimensional (3D) spectral data obtained from the Commensal Radio Astronomy FAST Survey (CRAFTS). We have carefully assembled a specialized dataset, HISF, rich in \hi\ sources, specifically designed to enhance the detection process. Our model, Unet-LK, utilizes the advanced 3D-Unet segmentation architecture and employs an elongated convolution kernel to effectively capture the intricate structures of \hi\ sources. This strategy ensures a reliable identification and segmentation of \hi\ sources, achieving notable performance metrics with a recall rate of 91.6\% and an accuracy of 95.7\%. These results substantiate the robustness of our dataset and the effectiveness of our proposed network architecture in the precise identification of \hi\ sources. Our code and dataset is publicly available at \url{https://github.com/fishszh/HISF}.
Paper Structure (8 sections, 4 equations, 8 figures, 2 tables)

This paper contains 8 sections, 4 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Data processing pipeline for Hi source identification. (a) The CRAFTS 3D spectral cube data in our possession is meticulously refined from the raw data. It involves critical processing steps like RFI flagging and Doppler correction, ensuring the accuracy and reliability of our observations. However, the details of these processing steps are beyond the scope of this paper. (b) Upon obtaining the 3D spectral cubes, our methodology commences with expert source identification, followed by manual annotation facilitated by 3D Slicer. Utilizing the ensuing labeled dataset, we then proceed with our model training for Hi source recognition. The "Fore-ground" subplot illustrating the distribution of Hi source signals within the cube, including an inset that magnifies the details of these signals. The "mask" subplot delineates the regions of the annotated Hi sources.
  • Figure 2: The region depicted between the red lines reveals the overall sky coverage of CRAFTS. Notably, the regions designated as R1 and R2 are the distinct areas where we have performed data annotation.
  • Figure 3: 3D Slicer is a free, open source software package for visualization and image analysis. This is an example for visualizing 3D spectral cubes and annotating Hi sources. The Segment Editor panel is utilized for manually creating and refining (paint, draw, …) segmentations from the orthogonal planes of the 3D spectral cube. Additionally, the top right panel allows for the examination of 3D segmentations.
  • Figure 4: The distribution of Hi source extents across R.A., DEC. and frequency axes, measured in pixel units, with a spatial resolution of 0.0167 degrees/pixel and a frequency resolution of 7.6 kHz/pixel. The Hi sources exhibit a pronounced elongation, with their frequency pixel span markedly exceeding the spatial dimensions.
  • Figure 5: The model pipeline for identifying Hi sources, including data pre-processing, model training, and post-processing to refine the results. Two strategies, rebin and crop, can be applied either individually or in combination, as shown in \ref{['tab:performance']}.
  • ...and 3 more figures