Table of Contents
Fetching ...

Radio Galaxy Zoo: Morphological classification by Fanaroff-Riley designation using self-supervised pre-training

Nutthawara Buatthaisong, Inigo Val Slijepcevic, Anna M. M. Scaife, Micah Bowles, Andrew Hopkins, Devina Mohan, Stanislav S Shabala, O. Ivy Wong

TL;DR

This work demonstrates a self-supervised foundation-model approach to Fanaroff–Riley morphology classification of radio galaxies, pre-training on RGZ DR1 data and fine-tuning on MiraBest to predict FRI/FRII labels for the RGZ catalogue. It produces a sizeable RGZ FR-z sample and confirms the longstanding overlap between FRIs and FRIIs in luminosity–size space, including a substantial fraction of low-luminosity FRII sources. The study introduces a vote fraction to quantify model confidence, analyzes biases from pre-training and fine-tuning data, and cross-matches RGZ results with LoTSS data to assess consistency across surveys. It highlights the practical impact of training data choices on downstream analyses and discusses moving beyond rigid FR classifications toward descriptive or latent-space representations that better capture physical diversity. Overall, the work emphasizes both the potential and the caveats of automated FR classification in the era of large radio surveys, and it calls for standardized, representative training sets to improve physical interpretability.

Abstract

In this study, we examine over 14,000 radio galaxies finely selected from Radio Galaxy Zoo (RGZ) project and provide classifications for approximately 5,900 FRIs and 8,100 FRIIs. We present an analysis of these predicted radio galaxy morphologies for the RGZ catalogue, classified using a pre-trained radio galaxy foundation model that has been fine-tuned to predict Fanaroff-Riley (FR) morphology. As seen in previous studies, our results show overlap between morphologically classified FRI and FRII luminosity-size distributions and we find that the model's confidence in its predictions is lowest in this overlap region, suggesting that source morphologies are more ambiguous. We identify the presence of low-luminosity FRII sources, the proportion of which, with respect to the total number of FRIIs, is consistent with previous studies. However, a comparison of the low-luminosity FRII sources found in this work with those identified by previous studies reveals differences that may indicate their selection is influenced by the choice of classification methodology. We investigate the impacts of both pre-training and fine-tuning data selection on model performance for the downstream classification task, and show that while different pre-training data choices affect model confidence they do not appear to cause systematic generalisation biases for the range of physical and observational characteristics considered in this work; however, we note that the same is not necessarily true for fine-tuning. As automated approaches to astronomical source identification and classification become increasingly prevalent, we highlight training data choices that can affect the model outputs and propagate into downstream analyses.

Radio Galaxy Zoo: Morphological classification by Fanaroff-Riley designation using self-supervised pre-training

TL;DR

This work demonstrates a self-supervised foundation-model approach to Fanaroff–Riley morphology classification of radio galaxies, pre-training on RGZ DR1 data and fine-tuning on MiraBest to predict FRI/FRII labels for the RGZ catalogue. It produces a sizeable RGZ FR-z sample and confirms the longstanding overlap between FRIs and FRIIs in luminosity–size space, including a substantial fraction of low-luminosity FRII sources. The study introduces a vote fraction to quantify model confidence, analyzes biases from pre-training and fine-tuning data, and cross-matches RGZ results with LoTSS data to assess consistency across surveys. It highlights the practical impact of training data choices on downstream analyses and discusses moving beyond rigid FR classifications toward descriptive or latent-space representations that better capture physical diversity. Overall, the work emphasizes both the potential and the caveats of automated FR classification in the era of large radio surveys, and it calls for standardized, representative training sets to improve physical interpretability.

Abstract

In this study, we examine over 14,000 radio galaxies finely selected from Radio Galaxy Zoo (RGZ) project and provide classifications for approximately 5,900 FRIs and 8,100 FRIIs. We present an analysis of these predicted radio galaxy morphologies for the RGZ catalogue, classified using a pre-trained radio galaxy foundation model that has been fine-tuned to predict Fanaroff-Riley (FR) morphology. As seen in previous studies, our results show overlap between morphologically classified FRI and FRII luminosity-size distributions and we find that the model's confidence in its predictions is lowest in this overlap region, suggesting that source morphologies are more ambiguous. We identify the presence of low-luminosity FRII sources, the proportion of which, with respect to the total number of FRIIs, is consistent with previous studies. However, a comparison of the low-luminosity FRII sources found in this work with those identified by previous studies reveals differences that may indicate their selection is influenced by the choice of classification methodology. We investigate the impacts of both pre-training and fine-tuning data selection on model performance for the downstream classification task, and show that while different pre-training data choices affect model confidence they do not appear to cause systematic generalisation biases for the range of physical and observational characteristics considered in this work; however, we note that the same is not necessarily true for fine-tuning. As automated approaches to astronomical source identification and classification become increasingly prevalent, we highlight training data choices that can affect the model outputs and propagate into downstream analyses.

Paper Structure

This paper contains 23 sections, 2 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: The distribution as a function of total flux density and angular size with the angular size and flux density thresholds of 21.2 arcsecond (dashed line) and 0.75 mJy (dotted line), respectively. (a) all RGZ sources (141,678 sources); (b) the RGZ FR sources (14,375 sources) with redshift information and angular size and flux density above the thresholds, see Section \ref{['sec:frcat']} for more details.
  • Figure 2: Redshift distribution of RGZ FR sources, including FRIs (orange), FRII-Low objects (red), and luminous FRIIs (blue).
  • Figure 3: Luminosity (W/Hz) at 1.4 GHz versus physical size (Mpc) of 14,375 RGZ FR sources, the dashed line represents the historical FRI/II break line ($\sim2\times10^{25}$ W/Hz).
  • Figure 4: WISE colour/colour plot comparing among selected RGZ FRI (yellow), FRII-Low (red), and FRII-High (blue) objects with the W1 and W2 SNR above 5 and the W3 SNR (a) above 3 (2,494 sources) and (b) lower than 3 (8,358 sources).
  • Figure 5: Comparison between RGZ FR sources with a vote fraction (VF) equal to 1 (black) and the sources with the VF value less than 1 (coloured). (a) Total flux density (mJy) at 1.4 GHz versus angular size ( arcsecond) with with the angular size and flux density thresholds of 21.2 arcsecond and 0.75 mJy, indicated by the dashed lines, respectively. (b) WISE colour/colour plot comparing RGZ FR sources with VF equal to 1 and sources with VF value less than 1.
  • ...and 8 more figures