Table of Contents
Fetching ...

Classification of compact radio sources in the Galactic plane with supervised machine learning

S. Riggi, G. Umana, C. Trigilio, C. Bordiu, F. Bufano, A. Ingallinera, F. Cavallaro, Y. Gordon, R. P. Norris, G. Gürkan, P. Leto, C. Buemi, S. Loru, A. M. Hopkins, M. D. Filipović, T. Cecconello

TL;DR

The paper tackles automated classification of compact radio sources in the Galactic plane for SKA-era surveys by building a large, multi-wavelength labeled dataset (~20k images) from ASKAP and IR data, and evaluating two supervised approaches: LightGBM on engineered color and spectral features and CNNs on image data. LightGBM achieves strong Galactic-vs-Extragalactic separation ($F1$-score > 0.9) and, with added far-infrared data and the radio spectral index, substantially improves per-class performance for Galactic objects (notably PNe, Hii regions, and pulsars); CNNs provide competitive multiclass performance and better Galactic-class discrimination in some cases. The authors release the sclassifier tool and trained models to support future SKA analyses, and discuss limitations related to class-label reliability and sample sizes, outlining plans for incorporating additional classes (e.g., star-forming galaxies) and unsupervised analyses. Overall, the work demonstrates the value of combining multi-wavelength features with supervised learning for scalable radio-source classification in crowded Galactic-plane surveys.

Abstract

Generation of science-ready data from processed data products is one of the major challenges in next-generation radio continuum surveys with the Square Kilometre Array (SKA) and its precursors, due to the expected data volume and the need to achieve a high degree of automated processing. Source extraction, characterization, and classification are the major stages involved in this process. In this work we focus on the classification of compact radio sources in the Galactic plane using both radio and infrared images as inputs. To this aim, we produced a curated dataset of ~20,000 images of compact sources of different astronomical classes, obtained from past radio and infrared surveys, and novel radio data from pilot surveys carried out with the Australian SKA Pathfinder (ASKAP). Radio spectral index information was also obtained for a subset of the data. We then trained two different classifiers on the produced dataset. The first model uses gradient-boosted decision trees and is trained on a set of pre-computed features derived from the data, which include radio-infrared colour indices and the radio spectral index. The second model is trained directly on multi-channel images, employing convolutional neural networks. Using a completely supervised procedure, we obtained a high classification accuracy (F1-score>90%) for separating Galactic objects from the extragalactic background. Individual class discrimination performances, ranging from 60% to 75%, increased by 10% when adding far-infrared and spectral index information, with extragalactic objects, PNe and HII regions identified with higher accuracies. The implemented tools and trained models were publicly released, and made available to the radioastronomical community for future application on new radio data.

Classification of compact radio sources in the Galactic plane with supervised machine learning

TL;DR

The paper tackles automated classification of compact radio sources in the Galactic plane for SKA-era surveys by building a large, multi-wavelength labeled dataset (~20k images) from ASKAP and IR data, and evaluating two supervised approaches: LightGBM on engineered color and spectral features and CNNs on image data. LightGBM achieves strong Galactic-vs-Extragalactic separation (-score > 0.9) and, with added far-infrared data and the radio spectral index, substantially improves per-class performance for Galactic objects (notably PNe, Hii regions, and pulsars); CNNs provide competitive multiclass performance and better Galactic-class discrimination in some cases. The authors release the sclassifier tool and trained models to support future SKA analyses, and discuss limitations related to class-label reliability and sample sizes, outlining plans for incorporating additional classes (e.g., star-forming galaxies) and unsupervised analyses. Overall, the work demonstrates the value of combining multi-wavelength features with supervised learning for scalable radio-source classification in crowded Galactic-plane surveys.

Abstract

Generation of science-ready data from processed data products is one of the major challenges in next-generation radio continuum surveys with the Square Kilometre Array (SKA) and its precursors, due to the expected data volume and the need to achieve a high degree of automated processing. Source extraction, characterization, and classification are the major stages involved in this process. In this work we focus on the classification of compact radio sources in the Galactic plane using both radio and infrared images as inputs. To this aim, we produced a curated dataset of ~20,000 images of compact sources of different astronomical classes, obtained from past radio and infrared surveys, and novel radio data from pilot surveys carried out with the Australian SKA Pathfinder (ASKAP). Radio spectral index information was also obtained for a subset of the data. We then trained two different classifiers on the produced dataset. The first model uses gradient-boosted decision trees and is trained on a set of pre-computed features derived from the data, which include radio-infrared colour indices and the radio spectral index. The second model is trained directly on multi-channel images, employing convolutional neural networks. Using a completely supervised procedure, we obtained a high classification accuracy (F1-score>90%) for separating Galactic objects from the extragalactic background. Individual class discrimination performances, ranging from 60% to 75%, increased by 10% when adding far-infrared and spectral index information, with extragalactic objects, PNe and HII regions identified with higher accuracies. The implemented tools and trained models were publicly released, and made available to the radioastronomical community for future application on new radio data.
Paper Structure (32 sections, 20 figures, 9 tables)

This paper contains 32 sections, 20 figures, 9 tables.

Figures (20)

  • Figure 1: Template source (G324.161+00.264, Hii region) from the dataset, observed in 7-bands (3.4$~µm$, 4.6$~µm$, 8$~µm$, 12$~µm$, 22$~µm$, 70$~µm$, and ASKAP radio 944 MHz), shown in left to right panels, respectively.
  • Figure 2: Scatter plots of representative infrared/radio colour indices computed over the entire dataset for images with detected sources in both the radio and infrared channels (IoUs>0). Radio flux densities are obtained at different frequencies ranging from 0.912 GHz (ASKAP Early Science survey data) to 5.8 GHz (GLOSTAR). See Section \ref{['sec:observations']} for details on survey frequencies.
  • Figure 3: Radio spectral indices measured for different source classes with the T-T plot method. Spectral indices for RG and QSO sources were computed using RACS-FIRST radio frequencies (887.5$-$1400 MHz). Indices for the remaining Galactic classes were computed from survey selected sub-bands (when available), i.e. 871$-$1480 MHz (ASKAP Scorpio), 1060$-$1440 MHz (THOR), 4240$-$4670 MHz (GLOSTAR).
  • Figure 4: Average F1-score metric achieved by the LightGBM trained classifier for binary classification of Galactic and Extragalactic source groups and for multiclass classification, computed over five "mixed" survey test sets (labelled as "mixed" and shown with filled markers) and pure ASKAP test sets (labelled as "askap" and shown with open markers). The error bars are the F1-score standard deviations obtained over the five test sets. Results obtained over the 5-band (radio+MIR) datasets without and with the spectral index ($\alpha$) information are respectively shown with black dots and green triangles, while results obtained over the 7-band (radio+MIR+FIR) datasets are respectively shown with red squares and blue inverted triangles.
  • Figure 5: Confusion matrix of the trained LightGBM classifier obtained over the 5-band (radio+MIR) pure ASKAP test datasets.
  • ...and 15 more figures