Table of Contents
Fetching ...

Bayesian Additive Regression Networks

Danielle Van Boxel

TL;DR

Bayesian Additive Regression Networks (BARN) adapt the BART posterior sampling framework to an ensemble of single-hidden-layer neural networks for regression. By updating one network at a time against residuals and using a Poisson prior on network size, BARN performs a neural architecture search within a Bayesian additive model, achieving lower test RMSE on multiple benchmarks at the cost of increased computation. The approach demonstrates robustness across diverse regression problems and synthetic scenarios, though it invites further theoretical analysis of the MCMC transitions and potential extensions to classification or other backbones. Overall, BARN offers a principled way to combine Bayesian model-search with neural networks to improve predictive accuracy in regression tasks, with practical availability through open-source code.

Abstract

We apply Bayesian Additive Regression Tree (BART) principles to training an ensemble of small neural networks for regression tasks. Using Markov Chain Monte Carlo, we sample from the posterior distribution of neural networks that have a single hidden layer. To create an ensemble of these, we apply Gibbs sampling to update each network against the residual target value (i.e. subtracting the effect of the other networks). We demonstrate the effectiveness of this technique on several benchmark regression problems, comparing it to equivalent shallow neural networks, BART, and ordinary least squares. Our Bayesian Additive Regression Networks (BARN) provide more consistent and often more accurate results. On test data benchmarks, BARN averaged between 5 to 20 percent lower root mean square error. This error performance does come at the cost, however, of greater computation time. BARN sometimes takes on the order of a minute where competing methods take a second or less. But, BARN without cross-validated hyperparameter tuning takes about the same amount of computation time as tuned other methods. Yet BARN is still typically more accurate.

Bayesian Additive Regression Networks

TL;DR

Bayesian Additive Regression Networks (BARN) adapt the BART posterior sampling framework to an ensemble of single-hidden-layer neural networks for regression. By updating one network at a time against residuals and using a Poisson prior on network size, BARN performs a neural architecture search within a Bayesian additive model, achieving lower test RMSE on multiple benchmarks at the cost of increased computation. The approach demonstrates robustness across diverse regression problems and synthetic scenarios, though it invites further theoretical analysis of the MCMC transitions and potential extensions to classification or other backbones. Overall, BARN offers a principled way to combine Bayesian model-search with neural networks to improve predictive accuracy in regression tasks, with practical availability through open-source code.

Abstract

We apply Bayesian Additive Regression Tree (BART) principles to training an ensemble of small neural networks for regression tasks. Using Markov Chain Monte Carlo, we sample from the posterior distribution of neural networks that have a single hidden layer. To create an ensemble of these, we apply Gibbs sampling to update each network against the residual target value (i.e. subtracting the effect of the other networks). We demonstrate the effectiveness of this technique on several benchmark regression problems, comparing it to equivalent shallow neural networks, BART, and ordinary least squares. Our Bayesian Additive Regression Networks (BARN) provide more consistent and often more accurate results. On test data benchmarks, BARN averaged between 5 to 20 percent lower root mean square error. This error performance does come at the cost, however, of greater computation time. BARN sometimes takes on the order of a minute where competing methods take a second or less. But, BARN without cross-validated hyperparameter tuning takes about the same amount of computation time as tuned other methods. Yet BARN is still typically more accurate.
Paper Structure (9 sections, 6 equations, 6 figures, 6 tables)

This paper contains 9 sections, 6 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Typical error results for one of the data sets during the MCMC process shows burn-in achieved within the full run
  • Figure 2: BARN is more adaptable to different problems than other methods and outperforms them across data sets
  • Figure 3: BARN resists overfitting similar to other methods when comparing $R^2$ values across training and testing data with pooled variance
  • Figure 4: BARN is competitive with other methods across data sets
  • Figure 5: Posterior distribution of neuron counts varies across data sets
  • ...and 1 more figures