Table of Contents
Fetching ...

Evaluating Bayesian deep learning for radio galaxy classification

Devina Mohan, Anna M. M. Scaife

TL;DR

The paper tackles reliable Bayesian deep learning for radio galaxy classification under small labeled datasets. It compares posterior approximations—including Hamiltonian Monte Carlo (HMC), Variational Inference (VI), Last-layer Laplace (LLA), MC Dropout, and Deep Ensembles—across predictive performance, uncertainty calibration, and distribution-shift detection using MiraBest, MIGHTEE, and GalaxyMNIST, by examining samples from the posterior $P(\theta|D)$ and posterior predictive $p(y|x,D)$. Key findings show HMC and VI delivering strong performance and well-calibrated uncertainties, with VI offering the best predictive accuracy at substantially lower cost than HMC; MC Dropout and Deep Ensembles lag in calibration. Energy-score analyses reveal HMC and VI are more effective at identifying distribution shifts, supporting their use in survey pipelines. The work suggests VI as a practical route for future radio surveys, while highlighting open questions on cold posterior effects and energy-based OoD metrics.

Abstract

The radio astronomy community is rapidly adopting deep learning techniques to deal with the huge data volumes expected from the next generation of radio observatories. Bayesian neural networks (BNNs) provide a principled way to model uncertainty in the predictions made by such deep learning models and will play an important role in extracting well-calibrated uncertainty estimates on their outputs. In this work, we evaluate the performance of different BNNs against the following criteria: predictive performance, uncertainty calibration and distribution-shift detection for the radio galaxy classification problem.

Evaluating Bayesian deep learning for radio galaxy classification

TL;DR

The paper tackles reliable Bayesian deep learning for radio galaxy classification under small labeled datasets. It compares posterior approximations—including Hamiltonian Monte Carlo (HMC), Variational Inference (VI), Last-layer Laplace (LLA), MC Dropout, and Deep Ensembles—across predictive performance, uncertainty calibration, and distribution-shift detection using MiraBest, MIGHTEE, and GalaxyMNIST, by examining samples from the posterior and posterior predictive . Key findings show HMC and VI delivering strong performance and well-calibrated uncertainties, with VI offering the best predictive accuracy at substantially lower cost than HMC; MC Dropout and Deep Ensembles lag in calibration. Energy-score analyses reveal HMC and VI are more effective at identifying distribution shifts, supporting their use in survey pipelines. The work suggests VI as a practical route for future radio surveys, while highlighting open questions on cold posterior effects and energy-based OoD metrics.

Abstract

The radio astronomy community is rapidly adopting deep learning techniques to deal with the huge data volumes expected from the next generation of radio observatories. Bayesian neural networks (BNNs) provide a principled way to model uncertainty in the predictions made by such deep learning models and will play an important role in extracting well-calibrated uncertainty estimates on their outputs. In this work, we evaluate the performance of different BNNs against the following criteria: predictive performance, uncertainty calibration and distribution-shift detection for the radio galaxy classification problem.
Paper Structure (25 sections, 5 equations, 2 figures, 1 table)

This paper contains 25 sections, 5 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Images from the datasets used in this work: top two rows contain images of Fanaroff Riley Type I (FRI) and Type II (FRII) radio galaxies from the MiraBest Confident dataset on which our BNNs are trained on. The third row contains FRI/FRII galaxies from the MIGHTEE dataset. The fourth row contains optical galaxies from the GalaxyMNIST dataset. We use the MIGHTEE and GalaxyMNIST datasets to evaluate our models' ability to detect different types of distribution shifts. See Section \ref{['sec:data']} for details about the datasets.
  • Figure 2: Detecting distribution shift with energy scores: Histograms of energy scores calculated for the MiraBest Confident (MBConf; blue), GalaxyMNIST (orange) and MIGHTEE (green) test datasets for the different models considered in this work, see Section \ref{['sec:ood']} for details. The histograms are plotted with a bin width of $0.1$. Axes are truncated so that we can examine where samples from each dataset lie. We find that HMC is the only inference method for which all the datasets can be easily distinguished.