Table of Contents
Fetching ...

Accelerated Execution of Bayesian Neural Networks using a Single Probabilistic Forward Pass and Code Generation

Bernhard Klein, Falk Selker, Hendrik Borras, Sophie Steger, Franz Pernkopf, Holger Fröning

TL;DR

This work tackles the difficulty of deploying uncertainty-aware Bayesian neural networks on resource-limited devices by introducing the Probabilistic Forward Pass (PFP), an analytic, single-pass method that propagates Gaussian weight and activation distributions. The authors extend the TVM compiler with a dedicated PFP operator library and optimized execution schedules to run PFP-based BNNs efficiently on ARM CPUs. Empirical results show PFP delivers speedups up to 4200x over sampling-based BNNs while maintaining comparable accuracy and OOD detection, illustrating a practical pathway for uncertainty-aware edge inference. Overall, the study demonstrates how probabilistic modeling and code-generation can bridge the gap between complex Bayesian methods and constrained hardware, enabling confident, uncertainty-aware decisions at the edge.

Abstract

Machine learning models perform well across domains such as diagnostics, weather forecasting, NLP, and autonomous driving, but their limited uncertainty handling restricts use in safety-critical settings. Traditional neural networks often fail to detect out-of-domain (OOD) data and may output confident yet incorrect predictions. Bayesian neural networks (BNNs) address this by providing probabilistic estimates, but incur high computational cost because predictions require sampling weight distributions and multiple forward passes. The Probabilistic Forward Pass (PFP) offers a highly efficient approximation to Stochastic Variational Inference (SVI) by assuming Gaussian-distributed weights and activations, enabling fully analytic uncertainty propagation and replacing sampling with a single deterministic forward pass. We present an end-to-end pipeline for training, compiling, optimizing, and deploying PFP-based BNNs on embedded ARM CPUs. Using the TVM deep learning compiler, we implement a dedicated library of Gaussian-propagating operators for multilayer perceptrons and convolutional neural networks, combined with manual and automated tuning strategies. Ablation studies show that PFP consistently outperforms SVI in computational efficiency, achieving speedups of up to 4200x for small mini-batches. PFP-BNNs match SVI-BNNs on Dirty-MNIST in accuracy, uncertainty estimation, and OOD detection while greatly reducing compute cost. These results highlight the potential of combining Bayesian approximations with code generation to enable efficient BNN deployment on resource-constrained systems.

Accelerated Execution of Bayesian Neural Networks using a Single Probabilistic Forward Pass and Code Generation

TL;DR

This work tackles the difficulty of deploying uncertainty-aware Bayesian neural networks on resource-limited devices by introducing the Probabilistic Forward Pass (PFP), an analytic, single-pass method that propagates Gaussian weight and activation distributions. The authors extend the TVM compiler with a dedicated PFP operator library and optimized execution schedules to run PFP-based BNNs efficiently on ARM CPUs. Empirical results show PFP delivers speedups up to 4200x over sampling-based BNNs while maintaining comparable accuracy and OOD detection, illustrating a practical pathway for uncertainty-aware edge inference. Overall, the study demonstrates how probabilistic modeling and code-generation can bridge the gap between complex Bayesian methods and constrained hardware, enabling confident, uncertainty-aware decisions at the edge.

Abstract

Machine learning models perform well across domains such as diagnostics, weather forecasting, NLP, and autonomous driving, but their limited uncertainty handling restricts use in safety-critical settings. Traditional neural networks often fail to detect out-of-domain (OOD) data and may output confident yet incorrect predictions. Bayesian neural networks (BNNs) address this by providing probabilistic estimates, but incur high computational cost because predictions require sampling weight distributions and multiple forward passes. The Probabilistic Forward Pass (PFP) offers a highly efficient approximation to Stochastic Variational Inference (SVI) by assuming Gaussian-distributed weights and activations, enabling fully analytic uncertainty propagation and replacing sampling with a single deterministic forward pass. We present an end-to-end pipeline for training, compiling, optimizing, and deploying PFP-based BNNs on embedded ARM CPUs. Using the TVM deep learning compiler, we implement a dedicated library of Gaussian-propagating operators for multilayer perceptrons and convolutional neural networks, combined with manual and automated tuning strategies. Ablation studies show that PFP consistently outperforms SVI in computational efficiency, achieving speedups of up to 4200x for small mini-batches. PFP-BNNs match SVI-BNNs on Dirty-MNIST in accuracy, uncertainty estimation, and OOD detection while greatly reducing compute cost. These results highlight the potential of combining Bayesian approximations with code generation to enable efficient BNN deployment on resource-constrained systems.

Paper Structure

This paper contains 20 sections, 11 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: (\ref{['fig:example_samples']}) Exemplary predictions from a SVI-BNN (blue), its Gaussian representation, and PFP roth2016pfp, showcasing MNIST Lecun1998mnist and Ambiguous-MNIST Mukhoti2022dirtyMNIST as in-domain examples, and Fashion-MNIST xiao2017fashionMnist as an out-of-domain sample. Variability in class predictions demonstrates aleatoric uncertainty (higher Softmax Entropy, SME), whereas variability across predictive samples indicates epistemic uncertainty (higher Mutual Information, MI). SVI and PFP effectively quantify these uncertainties. Note: Only three SVI samples shown, insufficient for robust estimation. (\ref{['fig:number_of_samples_vi']}) Influence of predictive sample count on uncertainty metrics. Softmax Entropy (aleatoric uncertainty) remains stable, while Total Predictive Uncertainty and Mutual Information (epistemic uncertainty), especially for out-of-domain data (Fashion-MNIST), require more samples for reliable OOD detection.
  • Figure 2: Illustration of Gaussian moment matching for ReLU activations. The true distribution (solid line) is approximated as a Gaussian (dashed line). Reproduced with permission from roth2016pfp.
  • Figure 3: Comparison of SVI and PFP uncertainty predictions. For MNIST, both uncertainties are expected to be low; Ambiguous-MNIST exhibits higher aleatoric uncertainty (Softmax Entropy), and Fashion-MNIST, as OOD data, shows higher epistemic uncertainty (Mutual Information). Both methods effectively assign the majority of images to their respective domains.
  • Figure 4: Disentangled Epistemic (MI) and Aleatoric (SME) Uncertainty. Comparison of SVI and PFP uncertainty predictions. While PFP performs comparably to SVI in estimating total uncertainty, its ability to disentangle aleatoric and epistemic uncertainties is somewhat limited, as anticipated. Nevertheless, the practical distinction remains sufficiently robust for most cases.
  • Figure 5: Performance comparison of operator implementations, evaluating the reformulation from Equation \ref{['eq:pfp_dense_scalar:variance']} to \ref{['eq:pfp_dense_scalar:E:variance']} and the use of separate vs. joint operators for mean and variance paths on a ARM Cortex-A72.
  • ...and 2 more figures