Table of Contents
Fetching ...

Improving Infinitely Deep Bayesian Neural Networks with Nesterov's Accelerated Gradient Method

Chenxu Yu, Wenqi Fang

Abstract

As a representative continuous-depth neural network approach, stochastic differential equation (SDE)-based Bayesian neural networks (BNNs) have attracted considerable attention due to their solid theoretical foundations and strong potential for real-world applications. However, their reliance on numerical SDE solvers inevitably incurs a large number of function evaluations (NFEs), resulting in high computational cost and occasional convergence instability. To address these challenges, we propose a Nesterov-accelerated gradient (NAG) enhanced SDE-BNN model. By integrating NAG into the SDE-BNN framework along with an NFE-dependent residual skip connection, our method accelerates convergence and substantially reduces NFEs during both training and testing. Extensive empirical results show that our model consistently outperforms conventional SDE-BNNs across various tasks, including image classification and sequence modeling, achieving lower NFEs and improved predictive accuracy.

Improving Infinitely Deep Bayesian Neural Networks with Nesterov's Accelerated Gradient Method

Abstract

As a representative continuous-depth neural network approach, stochastic differential equation (SDE)-based Bayesian neural networks (BNNs) have attracted considerable attention due to their solid theoretical foundations and strong potential for real-world applications. However, their reliance on numerical SDE solvers inevitably incurs a large number of function evaluations (NFEs), resulting in high computational cost and occasional convergence instability. To address these challenges, we propose a Nesterov-accelerated gradient (NAG) enhanced SDE-BNN model. By integrating NAG into the SDE-BNN framework along with an NFE-dependent residual skip connection, our method accelerates convergence and substantially reduces NFEs during both training and testing. Extensive empirical results show that our model consistently outperforms conventional SDE-BNNs across various tasks, including image classification and sequence modeling, achieving lower NFEs and improved predictive accuracy.

Paper Structure

This paper contains 15 sections, 12 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1:
  • Figure 2:
  • Figure 4: Predictive prior and posterior of Nesterov-SDEBNN on a non-monotonic toy dataset. The blue and red shaded regions denote the 95% confidence intervals of the prior and posterior, respectively, while the solid lines represent their corresponding mean predictions.
  • Figure 5: Comparison of test accuracy between SDE-BNN and Nesterov-SDEBNN: (Left) MNIST (Right) CIFAR-10.
  • Figure 6: Comparison of test NFEs between SDE-BNN and Nesterov-SDEBNN: (Left) MNIST (Right) CIFAR-10.
  • ...and 2 more figures