Jensen-Shannon Divergence Based Novel Loss Functions for Bayesian Neural Networks

Ponkrshnan Thiagarajan; Susanta Ghosh

Jensen-Shannon Divergence Based Novel Loss Functions for Bayesian Neural Networks

Ponkrshnan Thiagarajan, Susanta Ghosh

TL;DR

A novel loss function for BNNs is formulated based on a new modification to the generalized Jensen-Shannon (JS) divergence, which is bounded, and a Geometric JS divergence-based loss is proposed, which is computationally efficient since it can be evaluated analytically.

Abstract

Bayesian neural networks (BNNs) are state-of-the-art machine learning methods that can naturally regularize and systematically quantify uncertainties using their stochastic parameters. Kullback-Leibler (KL) divergence-based variational inference used in BNNs suffers from unstable optimization and challenges in approximating light-tailed posteriors due to the unbounded nature of the KL divergence. To resolve these issues, we formulate a novel loss function for BNNs based on a new modification to the generalized Jensen-Shannon (JS) divergence, which is bounded. In addition, we propose a Geometric JS divergence-based loss, which is computationally efficient since it can be evaluated analytically. We found that the JS divergence-based variational inference is intractable, and hence employed a constrained optimization framework to formulate these losses. Our theoretical analysis and empirical experiments on multiple regression and classification data sets suggest that the proposed losses perform better than the KL divergence-based loss, especially when the data sets are noisy or biased. Specifically, there are approximately 5% and 8% improvements in accuracy for a noise-added CIFAR-10 dataset and a regression dataset, respectively. There is about a 13% reduction in false negative predictions of a biased histopathology dataset. In addition, we quantify and compare the uncertainty metrics for the regression and classification tasks.

Jensen-Shannon Divergence Based Novel Loss Functions for Bayesian Neural Networks

TL;DR

Abstract

Paper Structure (45 sections, 44 equations, 15 figures, 7 tables, 2 algorithms)

This paper contains 45 sections, 44 equations, 15 figures, 7 tables, 2 algorithms.

Introduction
Key contributions
Mathematical Background
Background: KL and JS divergences
Background: Variational inference
Methods
Proposed modification to the generalized JS divergence
Intractability of the JS divergence-based loss functions formulated through the variational inference approach
Proposed JS divergence-based loss functions formulated through a constrained optimization approach
Geometric JS divergence
Modified generalized JS divergence
Minimisation of the proposed loss functions
Evaluation of the JS-G divergence in a closed-form
Evaluation of divergences via a Monte Carlo sampling
Insights into the proposed JS divergence-based loss functions
...and 30 more sections

Figures (15)

Figure 1: Depiction of the unboundedness (denoted by $\infty$) of the KL and JS-G divergence and the boundedness of the JS-A divergence. The distributions $q$ and $P$ are assumed Gaussian and Uniform respectively with $q = \mathcal{N}(0,\sigma)$ and $P = \mathcal{U}(-5,5)$.
Figure 2: Comparison of the KL and the JS divergences of distributions P and q. (a) and (b) $\sigma_q^2, \mu_p, \sigma_p^2$ are fixed and $\mu_q$ is varied. (c) and (d) $\mu_q, \mu_p, \sigma_p^2$ are fixed and $\sigma_q^2$ is varied. The fixed values of the parameters are $\mu_q = 0.1 ,\sigma_q^2 = 0.01, \mu_p=0, \sigma_p^2 = 0.1$
Figure 3: Training and validation of (a) CIFAR-10 with added Gaussian noise (b) histopathology data set with bias.
Figure 4: Accuracy on (a-b) the CIFAR-10 test data at different noise levels (c) histopathology test data. Each box chart displays the median as the center line, the lower and upper quartiles as the box edges, and the minimum and maximum values as whiskers. (d) ROC curves, (e)-(g) Confusion matrices for histopathology data.
Figure 5: Confusion matrices for the histopathology dataset
...and 10 more figures

Jensen-Shannon Divergence Based Novel Loss Functions for Bayesian Neural Networks

TL;DR

Abstract

Jensen-Shannon Divergence Based Novel Loss Functions for Bayesian Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (15)