Natural gradient descent for improving variational inference based classification of radio galaxies
Devina Mohan, Anna M. M. Scaife
TL;DR
The paper addresses uncertainty calibration and distribution shift detection in radio galaxy classification using Bayesian neural networks. It investigates natural gradient descent via the iVON algorithm for variational inference, leveraging the Fisher information geometry with a diagonal Hessian approximation to enable scalable training. Findings show that iVON yields better uncertainty calibration and faster convergence than BBB-VI, with predictive performance comparable to HMC, though OoD detection across datasets can be reduced and the cold posterior effect persists. The work highlights optimizer-induced inductive biases as a crucial factor in VI outcomes and informs future design choices for probabilistic models in large-scale astronomical surveys.
Abstract
Bayesian neural networks (BNNs) are most commonly optimised with first-order optimisers such as stochastic gradient descent. However, when optimising for parameters of probabilistic models, incorporating second order information during optimisation can lead to a more direct path in the distribution space and faster convergence. In this work we examine whether using natural gradient descent can improve the performance of variational inference based classification of radio galaxies. We use the Improved Variational Online Newton (iVON) algorithm and compare its performance against a recent benchmark for BNNs for radio galaxy classification. We find that iVON results in better uncertainty calibration out of all the methods previously considered while providing similar predictive performance to the best performing inference methods such as Hamiltonian Monte Carlo and Bayes by Backprop based variational inference. Models trained with iVON can distinguish far out-of-distribution optical galaxy data, but they cannot reliably detect radio galaxy images from a telescope with different resolution and sensitivity. We find that the cold posterior effect persists in the models trained with iVON. Our results suggest that the choice of the optimiser can lead to qualitatively different solutions and future work using probabilistic neural network models should carefully consider the inductive biases being encoded through the optimisation process, in addition to the data, architecture and inference method.
