Table of Contents
Fetching ...

Decoding the Radio Sky: Bayesian Ensemble Learning and SVD-Based Feature Extraction for Automated Radio Galaxy Classification

Theophilus Ansah-Narh, Jordan Lontsi Tedongmo, Joseph Bremang Tandoh, Nia Imara, Ezekiel Nii Noye Nortey

TL;DR

The study tackles automated classification of radio galaxies in large surveys, addressing scale, noise, and class imbalance. It introduces a framework that fuses SVD‑based feature extraction with Local Neighbourhood Encoding and Bayesian ensemble learning, augmented by PSO‑tuned baselines and SHAP interpretability. Key results show Bayesian Stacking achieving near‑perfect performance (about $99.0\%$ accuracy and $F1\approx0.99$) with ROC AUCs around $0.99$, while SHAP reveals which principal components drive morphology distinctions. The approach provides uncertainty quantification, scales to next‑generation survey pipelines, and offers a reproducible, interpretable path for automated radio galaxy classification in data‑intensive astronomy.

Abstract

The classification of radio galaxies is central to understanding galaxy evolution, active galactic nuclei dynamics, and the large-scale structure of the universe. However, traditional manual techniques are inadequate for processing the massive, heterogeneous datasets generated by modern radio surveys. In this study, we present a probabilistic machine learning framework that integrates Singular Value Decomposition (SVD) for feature extraction with Bayesian ensemble learning to achieve robust, scalable radio galaxy classification. The SVD approach effectively reduces dimensionality while preserving key morphological structures, enabling efficient representation of galaxy features. To mitigate class imbalance and avoid the introduction of artefacts, we incorporate a Local Neighbourhood Encoding strategy tailored to the astrophysical distribution of galaxy types. The resulting features are used to train and optimize several baseline classifiers: Logistic Regression, Support Vector Machines, LightGBM, and Multi-Layer Perceptrons within bagging, boosting, and stacking ensembles governed by a Bayesian weighting scheme. Our results demonstrate that Bayesian ensembles outperform their traditional counterparts across all metrics, with the Bayesian stacking model achieving a classification accuracy of 99.0% and an F1-score of 0.99 across Compact, Bent, Fanaroff-Riley Type I (FR-I), and Type II (FR-II) sources. Interpretability is enhanced through SHAP analysis, which highlights the principal components most associated with morphological distinctions. Beyond improving classification performance, our framework facilitates uncertainty quantification, paving the way for more reliable integration into next-generation survey pipelines. This work contributes a reproducible and interpretable methodology for automated galaxy classification in the era of data-intensive radio astronomy.

Decoding the Radio Sky: Bayesian Ensemble Learning and SVD-Based Feature Extraction for Automated Radio Galaxy Classification

TL;DR

The study tackles automated classification of radio galaxies in large surveys, addressing scale, noise, and class imbalance. It introduces a framework that fuses SVD‑based feature extraction with Local Neighbourhood Encoding and Bayesian ensemble learning, augmented by PSO‑tuned baselines and SHAP interpretability. Key results show Bayesian Stacking achieving near‑perfect performance (about accuracy and ) with ROC AUCs around , while SHAP reveals which principal components drive morphology distinctions. The approach provides uncertainty quantification, scales to next‑generation survey pipelines, and offers a reproducible, interpretable path for automated radio galaxy classification in data‑intensive astronomy.

Abstract

The classification of radio galaxies is central to understanding galaxy evolution, active galactic nuclei dynamics, and the large-scale structure of the universe. However, traditional manual techniques are inadequate for processing the massive, heterogeneous datasets generated by modern radio surveys. In this study, we present a probabilistic machine learning framework that integrates Singular Value Decomposition (SVD) for feature extraction with Bayesian ensemble learning to achieve robust, scalable radio galaxy classification. The SVD approach effectively reduces dimensionality while preserving key morphological structures, enabling efficient representation of galaxy features. To mitigate class imbalance and avoid the introduction of artefacts, we incorporate a Local Neighbourhood Encoding strategy tailored to the astrophysical distribution of galaxy types. The resulting features are used to train and optimize several baseline classifiers: Logistic Regression, Support Vector Machines, LightGBM, and Multi-Layer Perceptrons within bagging, boosting, and stacking ensembles governed by a Bayesian weighting scheme. Our results demonstrate that Bayesian ensembles outperform their traditional counterparts across all metrics, with the Bayesian stacking model achieving a classification accuracy of 99.0% and an F1-score of 0.99 across Compact, Bent, Fanaroff-Riley Type I (FR-I), and Type II (FR-II) sources. Interpretability is enhanced through SHAP analysis, which highlights the principal components most associated with morphological distinctions. Beyond improving classification performance, our framework facilitates uncertainty quantification, paving the way for more reliable integration into next-generation survey pipelines. This work contributes a reproducible and interpretable methodology for automated galaxy classification in the era of data-intensive radio astronomy.

Paper Structure

This paper contains 19 sections, 28 equations, 10 figures, 2 tables, 1 algorithm.

Figures (10)

  • Figure 1: A visual representation of the four main types of radio galaxies: Compact, FRI, FRII, and Bent. Each row showcases examples of galaxies with distinct radio morphologies, revealing clues about their activity and evolution. Note: Each $128 \times 128$ pixel stamp corresponds to an angular field of view of approximately $230^{\prime\prime} \times 230^{\prime\prime}$ (i.e., $\approx 3.8^{\prime} \times 3.8^{\prime}$), based on the FIRST survey pixel scale of $1.8^{\prime\prime}$ per pixel.
  • Figure 2: Singular value decay profiles for four morphological types of radio galaxies: Compact (blue circles), Bent (orange squares), FRI (green triangles), and FRII (red crosses). The y-axis represents the normalised singular values, and the x-axis corresponds to the rank of the singular value components. Compact galaxies exhibit a sharp decline, indicative of their simple morphology. Bent galaxies show a slower decay, reflecting their intermediate complexity due to jet bending. FRI galaxies exhibit a moderate decay rate, representing their diffuse and gradual jet structures, while FRII galaxies display the slowest decay, highlighting their intrinsic structural complexity due to prominent lobes and hotspots. Note: Singular values are shown in normalised form ($\sigma_i^2 / \sum_j \sigma_j^2$) to ensure comparability across subsets, since raw singular values are scale-dependent.
  • Figure 3: Reconstruction of radio galaxy images using the top $120$ singular values out of 16,384. The first two rows correspond to reconstructed images for various galaxy types (Compact, Bent, FRI, and FRII), showcasing the retention of core morphological features such as jets, lobes, and hotspots. The bottom row displays the residual differences between the original and reconstructed images, highlighting the absence of finer details and noise in the reconstructions. The colour bar indicates the normalised intensity levels, with brighter regions representing higher energy emissions. This reconstruction emphasises the dominant structures while effectively suppressing noise and minor variations, illustrating the utility of Singular Value Decomposition for dimensionality reduction in astronomical imaging. Note: The reconstructed, residual, and original image stamps are $128 \times 128$ pixels, corresponding to an angular extent of roughly $230^{\prime\prime} \times 230^{\prime\prime}$ (i.e., $\approx 3.8^{\prime} \times 3.8^{\prime}$), using the FIRST survey pixel scale of $1.8^{\prime\prime}$ per pixel.
  • Figure 4: Class distribution of radio galaxy categories before and after applying the LNE algorithm. The initial distribution demonstrates class imbalance (refer to the left plot), with varying sample sizes across classes. After balancing, the distribution is uniform (refer to right plot), with each class containing an equal number of samples, facilitating unbiased model training and improved classification performance.
  • Figure 5: Decision Boundaries of Baseline Models on UMAP-transformed Radio Galaxy Data: This figure shows the decision boundaries of four baseline models--LogisticRegr, SVM, LightGBM, and MLP--trained on UMAP-transformed radio galaxy data. UMAP reduces the data's dimensionality while preserving its structure. Each plot overlays the model's decision boundaries on the UMAP-transformed data points, with colors indicating the predicted classes: 0 (Compact), 1 (Bent), 2 (FR-I), and 3 (FR-II). The plots visually compare how well each model separates the different classes in the feature space.
  • ...and 5 more figures