Evaluating Bayesian deep learning for radio galaxy classification
Devina Mohan, Anna M. M. Scaife
TL;DR
The paper tackles reliable Bayesian deep learning for radio galaxy classification under small labeled datasets. It compares posterior approximations—including Hamiltonian Monte Carlo (HMC), Variational Inference (VI), Last-layer Laplace (LLA), MC Dropout, and Deep Ensembles—across predictive performance, uncertainty calibration, and distribution-shift detection using MiraBest, MIGHTEE, and GalaxyMNIST, by examining samples from the posterior $P(\theta|D)$ and posterior predictive $p(y|x,D)$. Key findings show HMC and VI delivering strong performance and well-calibrated uncertainties, with VI offering the best predictive accuracy at substantially lower cost than HMC; MC Dropout and Deep Ensembles lag in calibration. Energy-score analyses reveal HMC and VI are more effective at identifying distribution shifts, supporting their use in survey pipelines. The work suggests VI as a practical route for future radio surveys, while highlighting open questions on cold posterior effects and energy-based OoD metrics.
Abstract
The radio astronomy community is rapidly adopting deep learning techniques to deal with the huge data volumes expected from the next generation of radio observatories. Bayesian neural networks (BNNs) provide a principled way to model uncertainty in the predictions made by such deep learning models and will play an important role in extracting well-calibrated uncertainty estimates on their outputs. In this work, we evaluate the performance of different BNNs against the following criteria: predictive performance, uncertainty calibration and distribution-shift detection for the radio galaxy classification problem.
