Table of Contents
Fetching ...

RG-CAT: Detection Pipeline and Catalogue of Radio Galaxies in the EMU Pilot Survey

Nikhel Gupta, Ray P. Norris, Zeeshan Hayder, Minh Huynh, Lars Petersson, X. Rosalind Wang, Andrew M. Hopkins, Heinz Andernach, Yjan Gordon, Simone Riggi, Miranda Yew, Evan J. Crawford, Bärbel Koribalski, Miroslav D. Filipović, Anna D. Kapinśka, Stanislav Shabala, Tessa Vernstrom, Joshua R. Marvil

TL;DR

The study introduces Gal-DINO, a Transformer-based, multimodal detection pipeline that jointly identifies radio galaxy morphologies, bounding boxes, and infrared host keypoints to construct the first EMU-PS radio galaxy catalogue. By training on ~5{,}000 labeled galaxies (extended, compact, and Pec morphologies) and using 8-channel radio+infrared inputs, the method achieves strong localization and host-prediction performance, with $AP_{50}$ for bounding boxes of $73.2\%$ and $AP_{50}$ for keypoints of $71.7\%$, while 99\% of central galaxies have $IoU>0.5$ and 98\% have host keypoints within $<3^{\prime\prime}$. The resulting consolidated catalogue contains 211{,}625 radio galaxies (including 201{,}211 compact and 10{,}414 extended), with 73\% infrared cross-matches to CatWISE and substantial cross-identifications in optical surveys, enabling robust multiwavelength studies. This pipeline markedly reduces manual labour in cataloguing multi-component radio sources and provides a scalable path for upcoming SKA-era surveys, supported by publicly available data and code.

Abstract

We present source detection and catalogue construction pipelines to build the first catalogue of radio galaxies from the 270 $\rm deg^2$ pilot survey of the Evolutionary Map of the Universe (EMU-PS) conducted with the Australian Square Kilometre Array Pathfinder (ASKAP) telescope. The detection pipeline uses Gal-DINO computer-vision networks (Gupta et al., 2024) to predict the categories of radio morphology and bounding boxes for radio sources, as well as their potential infrared host positions. The Gal-DINO network is trained and evaluated on approximately 5,000 visually inspected radio galaxies and their infrared hosts, encompassing both compact and extended radio morphologies. We find that the Intersection over Union (IoU) for the predicted and ground truth bounding boxes is larger than 0.5 for 99% of the radio sources, and 98% of predicted host positions are within $3^{\prime \prime}$ of the ground truth infrared host in the evaluation set. The catalogue construction pipeline uses the predictions of the trained network on the radio and infrared image cutouts based on the catalogue of radio components identified using the Selavy source finder algorithm. Confidence scores of the predictions are then used to prioritize Selavy components with higher scores and incorporate them first into the catalogue. This results in identifications for a total of 211,625 radio sources, with 201,211 classified as compact and unresolved. The remaining 10,414 are categorized as extended radio morphologies, including 582 FR-I, 5,602 FR-II, 1,494 FR-x (uncertain whether FR-I or FR-II), 2,375 R (single-peak resolved) radio galaxies, and 361 with peculiar and other rare morphologies. We cross-match the radio sources in the catalogue with the infrared and optical catalogues, finding infrared cross-matches for 73% and photometric redshifts for 36% of the radio galaxies.

RG-CAT: Detection Pipeline and Catalogue of Radio Galaxies in the EMU Pilot Survey

TL;DR

The study introduces Gal-DINO, a Transformer-based, multimodal detection pipeline that jointly identifies radio galaxy morphologies, bounding boxes, and infrared host keypoints to construct the first EMU-PS radio galaxy catalogue. By training on ~5{,}000 labeled galaxies (extended, compact, and Pec morphologies) and using 8-channel radio+infrared inputs, the method achieves strong localization and host-prediction performance, with for bounding boxes of and for keypoints of , while 99\% of central galaxies have and 98\% have host keypoints within . The resulting consolidated catalogue contains 211{,}625 radio galaxies (including 201{,}211 compact and 10{,}414 extended), with 73\% infrared cross-matches to CatWISE and substantial cross-identifications in optical surveys, enabling robust multiwavelength studies. This pipeline markedly reduces manual labour in cataloguing multi-component radio sources and provides a scalable path for upcoming SKA-era surveys, supported by publicly available data and code.

Abstract

We present source detection and catalogue construction pipelines to build the first catalogue of radio galaxies from the 270 pilot survey of the Evolutionary Map of the Universe (EMU-PS) conducted with the Australian Square Kilometre Array Pathfinder (ASKAP) telescope. The detection pipeline uses Gal-DINO computer-vision networks (Gupta et al., 2024) to predict the categories of radio morphology and bounding boxes for radio sources, as well as their potential infrared host positions. The Gal-DINO network is trained and evaluated on approximately 5,000 visually inspected radio galaxies and their infrared hosts, encompassing both compact and extended radio morphologies. We find that the Intersection over Union (IoU) for the predicted and ground truth bounding boxes is larger than 0.5 for 99% of the radio sources, and 98% of predicted host positions are within of the ground truth infrared host in the evaluation set. The catalogue construction pipeline uses the predictions of the trained network on the radio and infrared image cutouts based on the catalogue of radio components identified using the Selavy source finder algorithm. Confidence scores of the predictions are then used to prioritize Selavy components with higher scores and incorporate them first into the catalogue. This results in identifications for a total of 211,625 radio sources, with 201,211 classified as compact and unresolved. The remaining 10,414 are categorized as extended radio morphologies, including 582 FR-I, 5,602 FR-II, 1,494 FR-x (uncertain whether FR-I or FR-II), 2,375 R (single-peak resolved) radio galaxies, and 361 with peculiar and other rare morphologies. We cross-match the radio sources in the catalogue with the infrared and optical catalogues, finding infrared cross-matches for 73% and photometric redshifts for 36% of the radio galaxies.
Paper Structure (24 sections, 4 equations, 12 figures, 5 tables)

This paper contains 24 sections, 4 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Examples of the radio (left panels) and corresponding infrared (right panels) images, as described by the column titles. Each of these images has a frame size of $8^{\prime} \times 8^{\prime}$ in the sky ($240 \times 240$ pixels). In the radio images, we display classes and bounding boxes for radio galaxies encapsulating all their components. Here, the 'FR-X' type is positioned between the FR-I and FR-II categories, 'R' denotes resolved radio sources with one visible peak, 'C' represents compact unresolved radio sources, and 'Pec' refers to peculiar or other rare radio morphologies (see Section \ref{['SEC:RadioIrAnnotations']} for details). On the infrared images, circles indicate the positions of host galaxies.
  • Figure 2: Shown are the dataset split distributions, depicting the distributions of extended radio galaxies in a single cutout (left), their respective categories (middle), and the occupied area per radio galaxy (A; right). The tables below the figures provide detailed counts of radio morphologies in the training, validation, and test sets. See Section \ref{['SEC:Datastats']} for more details. Note that each radio morphology has a corresponding infrared host, so the counts here also represent the number of corresponding infrared hosts.
  • Figure 3: Shown is an example of an 8-channel image used for the training and evaluation of the Gal-DINO network. The first 7 channels contain data from the radio FITS file, representing the extended radio galaxy with clipping between the 50th percentile level and 7 specific maxima, corresponding to the 95th, 99th, 99.2th, 99.5th, 99.7th, 99.9th, and 99.99th percentile levels. The 8th channel, in the bottom right, displays the corresponding pre-processed infrared image. The bounding box and keypoint annotations are not depicted here for brevity; examples of these annotations are shown in Figure \ref{['FIG:RadioGalaxies8arcm']} on the radio (with maxima at the 95th percentile level) and infrared images, respectively.
  • Figure 4: Presented is the normalized confusion matrix for the Gal-DINO detection model. Each matrix is normalized based on the total number of galaxies within its corresponding class. The diagonal entries denote true positive (TP) instances, representing objects correctly detected with an IoU and OKS threshold surpassing 0.5 compared to the ground truth instances, and a confidence threshold of 0.25. False positive (FP) instances correspond to model detections lacking corresponding ground truth instances, while false negative (FN) instances signify objects that the model failed to detect at the same IoU and OKS thresholds, along with a confidence threshold of 0.25.
  • Figure 5: An overview of the catalogue construction pipeline. The process initiates with obtaining predictions from the Gal-DINO model for all radio and infrared cutouts centred at the components in the Selavy catalogue. Subsequently, a dictionary of predictions is generated for the central sources within these cutouts. The consolidated catalogue is then formed by calibrating the confidence scores in the dictionary, organizing them in descending order, and systematically consolidating and removing entries from the Selavy catalogue based on decreasing score values. We refer the readers to Section \ref{['SEC:Catalog']} for further details.
  • ...and 7 more figures