Table of Contents
Fetching ...

Machine Learning Frameworks for Large-Scale Radio Surveys: A Summary of Recent Studies

Nikhel Gupta

TL;DR

This work surveys the application of diverse machine learning strategies to the EMU radio survey data, spanning supervised, unsupervised, self-supervised, and weakly-supervised approaches. It introduces the RG-CAT pipeline and RadioGalaxyNET for large-scale radio galaxy detection and cataloging, with Gal-DINO achieving strong cross-modal detections of radio galaxies and infrared hosts. Self-supervised multimodal learning via OpenCLIP enables zero-shot classification and fast retrieval through the EMUSE search engine, while SOM-based unsupervised methods extend discovery of rare morphologies and ORCs. Weakly-supervised CAM-based segmentation demonstrates effective pixel-level localization using only coarse labels. Collectively, these methods accelerate analysis pipelines, improve catalog completeness, and prepare EMU for the forthcoming SKA era by enabling rapid discovery of novel radio phenomena.

Abstract

The rapid growth of large-scale radio surveys, generating over 100 petabytes of data annually, has created a pressing need for automated data analysis methods. Recent research has explored the application of machine learning techniques to address the challenges associated with detecting and classifying radio galaxies, as well as discovering peculiar radio sources. This paper provides an overview of our investigations with the Evolutionary Map of the Universe (EMU) survey, detailing the methodologies employed-including supervised, unsupervised, self-supervised, and weakly supervised learning approaches -- and their implications for ongoing and future radio astronomical surveys.

Machine Learning Frameworks for Large-Scale Radio Surveys: A Summary of Recent Studies

TL;DR

This work surveys the application of diverse machine learning strategies to the EMU radio survey data, spanning supervised, unsupervised, self-supervised, and weakly-supervised approaches. It introduces the RG-CAT pipeline and RadioGalaxyNET for large-scale radio galaxy detection and cataloging, with Gal-DINO achieving strong cross-modal detections of radio galaxies and infrared hosts. Self-supervised multimodal learning via OpenCLIP enables zero-shot classification and fast retrieval through the EMUSE search engine, while SOM-based unsupervised methods extend discovery of rare morphologies and ORCs. Weakly-supervised CAM-based segmentation demonstrates effective pixel-level localization using only coarse labels. Collectively, these methods accelerate analysis pipelines, improve catalog completeness, and prepare EMU for the forthcoming SKA era by enabling rapid discovery of novel radio phenomena.

Abstract

The rapid growth of large-scale radio surveys, generating over 100 petabytes of data annually, has created a pressing need for automated data analysis methods. Recent research has explored the application of machine learning techniques to address the challenges associated with detecting and classifying radio galaxies, as well as discovering peculiar radio sources. This paper provides an overview of our investigations with the Evolutionary Map of the Universe (EMU) survey, detailing the methodologies employed-including supervised, unsupervised, self-supervised, and weakly supervised learning approaches -- and their implications for ongoing and future radio astronomical surveys.

Paper Structure

This paper contains 9 sections, 1 figure.

Figures (1)

  • Figure 1: An example of radio galaxies from the RG-CAT catalogue, superimposed on the EMU-PS1 radio image, highlighting their host positions, bounding boxes, and classifications generated by the Gal-DINO network. See more examples in gupta24b.