RG-CAT: Detection Pipeline and Catalogue of Radio Galaxies in the EMU Pilot Survey
Nikhel Gupta, Ray P. Norris, Zeeshan Hayder, Minh Huynh, Lars Petersson, X. Rosalind Wang, Andrew M. Hopkins, Heinz Andernach, Yjan Gordon, Simone Riggi, Miranda Yew, Evan J. Crawford, Bärbel Koribalski, Miroslav D. Filipović, Anna D. Kapinśka, Stanislav Shabala, Tessa Vernstrom, Joshua R. Marvil
TL;DR
The study introduces Gal-DINO, a Transformer-based, multimodal detection pipeline that jointly identifies radio galaxy morphologies, bounding boxes, and infrared host keypoints to construct the first EMU-PS radio galaxy catalogue. By training on ~5{,}000 labeled galaxies (extended, compact, and Pec morphologies) and using 8-channel radio+infrared inputs, the method achieves strong localization and host-prediction performance, with $AP_{50}$ for bounding boxes of $73.2\%$ and $AP_{50}$ for keypoints of $71.7\%$, while 99\% of central galaxies have $IoU>0.5$ and 98\% have host keypoints within $<3^{\prime\prime}$. The resulting consolidated catalogue contains 211{,}625 radio galaxies (including 201{,}211 compact and 10{,}414 extended), with 73\% infrared cross-matches to CatWISE and substantial cross-identifications in optical surveys, enabling robust multiwavelength studies. This pipeline markedly reduces manual labour in cataloguing multi-component radio sources and provides a scalable path for upcoming SKA-era surveys, supported by publicly available data and code.
Abstract
We present source detection and catalogue construction pipelines to build the first catalogue of radio galaxies from the 270 $\rm deg^2$ pilot survey of the Evolutionary Map of the Universe (EMU-PS) conducted with the Australian Square Kilometre Array Pathfinder (ASKAP) telescope. The detection pipeline uses Gal-DINO computer-vision networks (Gupta et al., 2024) to predict the categories of radio morphology and bounding boxes for radio sources, as well as their potential infrared host positions. The Gal-DINO network is trained and evaluated on approximately 5,000 visually inspected radio galaxies and their infrared hosts, encompassing both compact and extended radio morphologies. We find that the Intersection over Union (IoU) for the predicted and ground truth bounding boxes is larger than 0.5 for 99% of the radio sources, and 98% of predicted host positions are within $3^{\prime \prime}$ of the ground truth infrared host in the evaluation set. The catalogue construction pipeline uses the predictions of the trained network on the radio and infrared image cutouts based on the catalogue of radio components identified using the Selavy source finder algorithm. Confidence scores of the predictions are then used to prioritize Selavy components with higher scores and incorporate them first into the catalogue. This results in identifications for a total of 211,625 radio sources, with 201,211 classified as compact and unresolved. The remaining 10,414 are categorized as extended radio morphologies, including 582 FR-I, 5,602 FR-II, 1,494 FR-x (uncertain whether FR-I or FR-II), 2,375 R (single-peak resolved) radio galaxies, and 361 with peculiar and other rare morphologies. We cross-match the radio sources in the catalogue with the infrared and optical catalogues, finding infrared cross-matches for 73% and photometric redshifts for 36% of the radio galaxies.
