Radio Galaxy Zoo: Morphological classification by Fanaroff-Riley designation using self-supervised pre-training
Nutthawara Buatthaisong, Inigo Val Slijepcevic, Anna M. M. Scaife, Micah Bowles, Andrew Hopkins, Devina Mohan, Stanislav S Shabala, O. Ivy Wong
TL;DR
This work demonstrates a self-supervised foundation-model approach to Fanaroff–Riley morphology classification of radio galaxies, pre-training on RGZ DR1 data and fine-tuning on MiraBest to predict FRI/FRII labels for the RGZ catalogue. It produces a sizeable RGZ FR-z sample and confirms the longstanding overlap between FRIs and FRIIs in luminosity–size space, including a substantial fraction of low-luminosity FRII sources. The study introduces a vote fraction to quantify model confidence, analyzes biases from pre-training and fine-tuning data, and cross-matches RGZ results with LoTSS data to assess consistency across surveys. It highlights the practical impact of training data choices on downstream analyses and discusses moving beyond rigid FR classifications toward descriptive or latent-space representations that better capture physical diversity. Overall, the work emphasizes both the potential and the caveats of automated FR classification in the era of large radio surveys, and it calls for standardized, representative training sets to improve physical interpretability.
Abstract
In this study, we examine over 14,000 radio galaxies finely selected from Radio Galaxy Zoo (RGZ) project and provide classifications for approximately 5,900 FRIs and 8,100 FRIIs. We present an analysis of these predicted radio galaxy morphologies for the RGZ catalogue, classified using a pre-trained radio galaxy foundation model that has been fine-tuned to predict Fanaroff-Riley (FR) morphology. As seen in previous studies, our results show overlap between morphologically classified FRI and FRII luminosity-size distributions and we find that the model's confidence in its predictions is lowest in this overlap region, suggesting that source morphologies are more ambiguous. We identify the presence of low-luminosity FRII sources, the proportion of which, with respect to the total number of FRIIs, is consistent with previous studies. However, a comparison of the low-luminosity FRII sources found in this work with those identified by previous studies reveals differences that may indicate their selection is influenced by the choice of classification methodology. We investigate the impacts of both pre-training and fine-tuning data selection on model performance for the downstream classification task, and show that while different pre-training data choices affect model confidence they do not appear to cause systematic generalisation biases for the range of physical and observational characteristics considered in this work; however, we note that the same is not necessarily true for fine-tuning. As automated approaches to astronomical source identification and classification become increasingly prevalent, we highlight training data choices that can affect the model outputs and propagate into downstream analyses.
