Accelerated Execution of Bayesian Neural Networks using a Single Probabilistic Forward Pass and Code Generation
Bernhard Klein, Falk Selker, Hendrik Borras, Sophie Steger, Franz Pernkopf, Holger Fröning
TL;DR
This work tackles the difficulty of deploying uncertainty-aware Bayesian neural networks on resource-limited devices by introducing the Probabilistic Forward Pass (PFP), an analytic, single-pass method that propagates Gaussian weight and activation distributions. The authors extend the TVM compiler with a dedicated PFP operator library and optimized execution schedules to run PFP-based BNNs efficiently on ARM CPUs. Empirical results show PFP delivers speedups up to 4200x over sampling-based BNNs while maintaining comparable accuracy and OOD detection, illustrating a practical pathway for uncertainty-aware edge inference. Overall, the study demonstrates how probabilistic modeling and code-generation can bridge the gap between complex Bayesian methods and constrained hardware, enabling confident, uncertainty-aware decisions at the edge.
Abstract
Machine learning models perform well across domains such as diagnostics, weather forecasting, NLP, and autonomous driving, but their limited uncertainty handling restricts use in safety-critical settings. Traditional neural networks often fail to detect out-of-domain (OOD) data and may output confident yet incorrect predictions. Bayesian neural networks (BNNs) address this by providing probabilistic estimates, but incur high computational cost because predictions require sampling weight distributions and multiple forward passes. The Probabilistic Forward Pass (PFP) offers a highly efficient approximation to Stochastic Variational Inference (SVI) by assuming Gaussian-distributed weights and activations, enabling fully analytic uncertainty propagation and replacing sampling with a single deterministic forward pass. We present an end-to-end pipeline for training, compiling, optimizing, and deploying PFP-based BNNs on embedded ARM CPUs. Using the TVM deep learning compiler, we implement a dedicated library of Gaussian-propagating operators for multilayer perceptrons and convolutional neural networks, combined with manual and automated tuning strategies. Ablation studies show that PFP consistently outperforms SVI in computational efficiency, achieving speedups of up to 4200x for small mini-batches. PFP-BNNs match SVI-BNNs on Dirty-MNIST in accuracy, uncertainty estimation, and OOD detection while greatly reducing compute cost. These results highlight the potential of combining Bayesian approximations with code generation to enable efficient BNN deployment on resource-constrained systems.
