Table of Contents
Fetching ...

Collapsed Inference for Bayesian Deep Learning

Zhe Zeng, Guy Van den Broeck

TL;DR

The paper addresses uncertainty estimation in Bayesian deep learning by reframing Bayesian model averaging (BMA) as a weighted volume computation (WVC) problem. It introduces CIBER, a collapsed inference method that blends SGD-based posterior sampling with exact marginalization over a collapsed subset of weights using weighted model integration (WMI) solved via SMT encodings. Key contributions include (i) establishing the BMA–WVC connection, (ii) designing a practical collapsed sampling scheme that uses uniform conditional posteriors and triangular predictive densities, (iii) proving that collapsed integrals can be computed exactly with WMI solvers, and (iv) demonstrating state-of-the-art uncertainty estimation and predictive performance on regression and image-classification benchmarks. The approach achieves improved sample efficiency and scalability without sacrificing accuracy, with strong results on small/large UCI datasets and CIFAR transfer tasks. This work opens avenues for broader SMT/WMI encodings and improved solver capabilities to advance Bayesian deep learning in practice.

Abstract

Bayesian neural networks (BNNs) provide a formalism to quantify and calibrate uncertainty in deep learning. Current inference approaches for BNNs often resort to few-sample estimation for scalability, which can harm predictive performance, while its alternatives tend to be computationally prohibitively expensive. We tackle this challenge by revealing a previously unseen connection between inference on BNNs and volume computation problems. With this observation, we introduce a novel collapsed inference scheme that performs Bayesian model averaging using collapsed samples. It improves over a Monte-Carlo sample by limiting sampling to a subset of the network weights while pairing it with some closed-form conditional distribution over the rest. A collapsed sample represents uncountably many models drawn from the approximate posterior and thus yields higher sample efficiency. Further, we show that the marginalization of a collapsed sample can be solved analytically and efficiently despite the non-linearity of neural networks by leveraging existing volume computation solvers. Our proposed use of collapsed samples achieves a balance between scalability and accuracy. On various regression and classification tasks, our collapsed Bayesian deep learning approach demonstrates significant improvements over existing methods and sets a new state of the art in terms of uncertainty estimation as well as predictive performance.

Collapsed Inference for Bayesian Deep Learning

TL;DR

The paper addresses uncertainty estimation in Bayesian deep learning by reframing Bayesian model averaging (BMA) as a weighted volume computation (WVC) problem. It introduces CIBER, a collapsed inference method that blends SGD-based posterior sampling with exact marginalization over a collapsed subset of weights using weighted model integration (WMI) solved via SMT encodings. Key contributions include (i) establishing the BMA–WVC connection, (ii) designing a practical collapsed sampling scheme that uses uniform conditional posteriors and triangular predictive densities, (iii) proving that collapsed integrals can be computed exactly with WMI solvers, and (iv) demonstrating state-of-the-art uncertainty estimation and predictive performance on regression and image-classification benchmarks. The approach achieves improved sample efficiency and scalability without sacrificing accuracy, with strong results on small/large UCI datasets and CIFAR transfer tasks. This work opens avenues for broader SMT/WMI encodings and improved solver capabilities to advance Bayesian deep learning in practice.

Abstract

Bayesian neural networks (BNNs) provide a formalism to quantify and calibrate uncertainty in deep learning. Current inference approaches for BNNs often resort to few-sample estimation for scalability, which can harm predictive performance, while its alternatives tend to be computationally prohibitively expensive. We tackle this challenge by revealing a previously unseen connection between inference on BNNs and volume computation problems. With this observation, we introduce a novel collapsed inference scheme that performs Bayesian model averaging using collapsed samples. It improves over a Monte-Carlo sample by limiting sampling to a subset of the network weights while pairing it with some closed-form conditional distribution over the rest. A collapsed sample represents uncountably many models drawn from the approximate posterior and thus yields higher sample efficiency. Further, we show that the marginalization of a collapsed sample can be solved analytically and efficiently despite the non-linearity of neural networks by leveraging existing volume computation solvers. Our proposed use of collapsed samples achieves a balance between scalability and accuracy. On various regression and classification tasks, our collapsed Bayesian deep learning approach demonstrates significant improvements over existing methods and sets a new state of the art in terms of uncertainty estimation as well as predictive performance.
Paper Structure (20 sections, 1 theorem, 20 equations, 5 figures, 10 tables, 1 algorithm)

This paper contains 20 sections, 1 theorem, 20 equations, 5 figures, 10 tables, 1 algorithm.

Key Result

Proposition 7

Let the SMT formula $\Delta = \Delta_{\mathsf{ReLU}\xspace} \land \Delta_{\mathit{pos}\xspace} \land \Delta_{\mathit{pred}\xspace}$, and the set of weights $\Phi\xspace = \Phi\xspace_{\mathit{pos}\xspace} \cup \Phi\xspace_{\mathit{pred}\xspace}$ as defined in Section sec: probs smt encoding. Let the

Figures (5)

  • Figure 1: The integral surface of (a) the expected prediction in BMA, and (b) our proposed approximation. Both are highly non-convex and multi-modal. The z-axis is the weighted prediction $y\xspace~p(y\xspace \mid \boldsymbol{x}\xspace, \bm{w}\xspace)~p(\bm{w}\xspace \mid \mathcal{D}\xspace)$. Integration of (a) does not admit a closed-form solution, yet integration of (b) is a close approximation that can be solved exactly and efficiently by WMI solvers.
  • Figure 2: Uncertainty estimates for regression. The red line is the ground truth. The dark blue line shows the predictive mean. The shaded region is the $90\%$ confidence interval of the predictive distribution. For the same number of samples, (b) CIBER is closer than (a) small-sample HMC to (c) a highly accurate but slow HMC with a large number of samples. See the Appendix for details.
  • Figure 3: Posterior predictive distributions in Bayesian linear regression. The $y$-axis shows the absolute difference between an estimated predictive distribution $p(y \mid \mathbf{x})$ and the ground-truth predictive distribution $q(y \mid \mathbf{x})$. Shaded regions are the $95\%$ confidence interval. Best viewed in color.
  • Figure 4: KL divergence in Bayesian linear regression. The $x$-axis shows the number of samples the MC method uses for estimations, ranging from $50$ to $150$. The blue curve shows the MC method, and the green dashed curve shows CIBER using $50$ samples.
  • Figure :

Theorems & Definitions (8)

  • Definition 1: WVC
  • Example 2
  • Example 3
  • Example 4
  • Definition 5
  • Definition 6
  • Proposition 7
  • proof