Table of Contents
Fetching ...

Jensen-Shannon Divergence Based Novel Loss Functions for Bayesian Neural Networks

Ponkrshnan Thiagarajan, Susanta Ghosh

TL;DR

A novel loss function for BNNs is formulated based on a new modification to the generalized Jensen-Shannon (JS) divergence, which is bounded, and a Geometric JS divergence-based loss is proposed, which is computationally efficient since it can be evaluated analytically.

Abstract

Bayesian neural networks (BNNs) are state-of-the-art machine learning methods that can naturally regularize and systematically quantify uncertainties using their stochastic parameters. Kullback-Leibler (KL) divergence-based variational inference used in BNNs suffers from unstable optimization and challenges in approximating light-tailed posteriors due to the unbounded nature of the KL divergence. To resolve these issues, we formulate a novel loss function for BNNs based on a new modification to the generalized Jensen-Shannon (JS) divergence, which is bounded. In addition, we propose a Geometric JS divergence-based loss, which is computationally efficient since it can be evaluated analytically. We found that the JS divergence-based variational inference is intractable, and hence employed a constrained optimization framework to formulate these losses. Our theoretical analysis and empirical experiments on multiple regression and classification data sets suggest that the proposed losses perform better than the KL divergence-based loss, especially when the data sets are noisy or biased. Specifically, there are approximately 5% and 8% improvements in accuracy for a noise-added CIFAR-10 dataset and a regression dataset, respectively. There is about a 13% reduction in false negative predictions of a biased histopathology dataset. In addition, we quantify and compare the uncertainty metrics for the regression and classification tasks.

Jensen-Shannon Divergence Based Novel Loss Functions for Bayesian Neural Networks

TL;DR

A novel loss function for BNNs is formulated based on a new modification to the generalized Jensen-Shannon (JS) divergence, which is bounded, and a Geometric JS divergence-based loss is proposed, which is computationally efficient since it can be evaluated analytically.

Abstract

Bayesian neural networks (BNNs) are state-of-the-art machine learning methods that can naturally regularize and systematically quantify uncertainties using their stochastic parameters. Kullback-Leibler (KL) divergence-based variational inference used in BNNs suffers from unstable optimization and challenges in approximating light-tailed posteriors due to the unbounded nature of the KL divergence. To resolve these issues, we formulate a novel loss function for BNNs based on a new modification to the generalized Jensen-Shannon (JS) divergence, which is bounded. In addition, we propose a Geometric JS divergence-based loss, which is computationally efficient since it can be evaluated analytically. We found that the JS divergence-based variational inference is intractable, and hence employed a constrained optimization framework to formulate these losses. Our theoretical analysis and empirical experiments on multiple regression and classification data sets suggest that the proposed losses perform better than the KL divergence-based loss, especially when the data sets are noisy or biased. Specifically, there are approximately 5% and 8% improvements in accuracy for a noise-added CIFAR-10 dataset and a regression dataset, respectively. There is about a 13% reduction in false negative predictions of a biased histopathology dataset. In addition, we quantify and compare the uncertainty metrics for the regression and classification tasks.
Paper Structure (45 sections, 44 equations, 15 figures, 7 tables, 2 algorithms)

This paper contains 45 sections, 44 equations, 15 figures, 7 tables, 2 algorithms.

Figures (15)

  • Figure 1: Depiction of the unboundedness (denoted by $\infty$) of the KL and JS-G divergence and the boundedness of the JS-A divergence. The distributions $q$ and $P$ are assumed Gaussian and Uniform respectively with $q = \mathcal{N}(0,\sigma)$ and $P = \mathcal{U}(-5,5)$.
  • Figure 2: Comparison of the KL and the JS divergences of distributions P and q. (a) and (b) $\sigma_q^2, \mu_p, \sigma_p^2$ are fixed and $\mu_q$ is varied. (c) and (d) $\mu_q, \mu_p, \sigma_p^2$ are fixed and $\sigma_q^2$ is varied. The fixed values of the parameters are $\mu_q = 0.1 ,\sigma_q^2 = 0.01, \mu_p=0, \sigma_p^2 = 0.1$
  • Figure 3: Training and validation of (a) CIFAR-10 with added Gaussian noise (b) histopathology data set with bias.
  • Figure 4: Accuracy on (a-b) the CIFAR-10 test data at different noise levels (c) histopathology test data. Each box chart displays the median as the center line, the lower and upper quartiles as the box edges, and the minimum and maximum values as whiskers. (d) ROC curves, (e)-(g) Confusion matrices for histopathology data.
  • Figure 5: Confusion matrices for the histopathology dataset
  • ...and 10 more figures