Decoding the Radio Sky: Bayesian Ensemble Learning and SVD-Based Feature Extraction for Automated Radio Galaxy Classification
Theophilus Ansah-Narh, Jordan Lontsi Tedongmo, Joseph Bremang Tandoh, Nia Imara, Ezekiel Nii Noye Nortey
TL;DR
The study tackles automated classification of radio galaxies in large surveys, addressing scale, noise, and class imbalance. It introduces a framework that fuses SVD‑based feature extraction with Local Neighbourhood Encoding and Bayesian ensemble learning, augmented by PSO‑tuned baselines and SHAP interpretability. Key results show Bayesian Stacking achieving near‑perfect performance (about $99.0\%$ accuracy and $F1\approx0.99$) with ROC AUCs around $0.99$, while SHAP reveals which principal components drive morphology distinctions. The approach provides uncertainty quantification, scales to next‑generation survey pipelines, and offers a reproducible, interpretable path for automated radio galaxy classification in data‑intensive astronomy.
Abstract
The classification of radio galaxies is central to understanding galaxy evolution, active galactic nuclei dynamics, and the large-scale structure of the universe. However, traditional manual techniques are inadequate for processing the massive, heterogeneous datasets generated by modern radio surveys. In this study, we present a probabilistic machine learning framework that integrates Singular Value Decomposition (SVD) for feature extraction with Bayesian ensemble learning to achieve robust, scalable radio galaxy classification. The SVD approach effectively reduces dimensionality while preserving key morphological structures, enabling efficient representation of galaxy features. To mitigate class imbalance and avoid the introduction of artefacts, we incorporate a Local Neighbourhood Encoding strategy tailored to the astrophysical distribution of galaxy types. The resulting features are used to train and optimize several baseline classifiers: Logistic Regression, Support Vector Machines, LightGBM, and Multi-Layer Perceptrons within bagging, boosting, and stacking ensembles governed by a Bayesian weighting scheme. Our results demonstrate that Bayesian ensembles outperform their traditional counterparts across all metrics, with the Bayesian stacking model achieving a classification accuracy of 99.0% and an F1-score of 0.99 across Compact, Bent, Fanaroff-Riley Type I (FR-I), and Type II (FR-II) sources. Interpretability is enhanced through SHAP analysis, which highlights the principal components most associated with morphological distinctions. Beyond improving classification performance, our framework facilitates uncertainty quantification, paving the way for more reliable integration into next-generation survey pipelines. This work contributes a reproducible and interpretable methodology for automated galaxy classification in the era of data-intensive radio astronomy.
