Collapsed Inference for Bayesian Deep Learning

Zhe Zeng; Guy Van den Broeck

Collapsed Inference for Bayesian Deep Learning

Zhe Zeng, Guy Van den Broeck

TL;DR

The paper addresses uncertainty estimation in Bayesian deep learning by reframing Bayesian model averaging (BMA) as a weighted volume computation (WVC) problem. It introduces CIBER, a collapsed inference method that blends SGD-based posterior sampling with exact marginalization over a collapsed subset of weights using weighted model integration (WMI) solved via SMT encodings. Key contributions include (i) establishing the BMA–WVC connection, (ii) designing a practical collapsed sampling scheme that uses uniform conditional posteriors and triangular predictive densities, (iii) proving that collapsed integrals can be computed exactly with WMI solvers, and (iv) demonstrating state-of-the-art uncertainty estimation and predictive performance on regression and image-classification benchmarks. The approach achieves improved sample efficiency and scalability without sacrificing accuracy, with strong results on small/large UCI datasets and CIFAR transfer tasks. This work opens avenues for broader SMT/WMI encodings and improved solver capabilities to advance Bayesian deep learning in practice.

Abstract

Bayesian neural networks (BNNs) provide a formalism to quantify and calibrate uncertainty in deep learning. Current inference approaches for BNNs often resort to few-sample estimation for scalability, which can harm predictive performance, while its alternatives tend to be computationally prohibitively expensive. We tackle this challenge by revealing a previously unseen connection between inference on BNNs and volume computation problems. With this observation, we introduce a novel collapsed inference scheme that performs Bayesian model averaging using collapsed samples. It improves over a Monte-Carlo sample by limiting sampling to a subset of the network weights while pairing it with some closed-form conditional distribution over the rest. A collapsed sample represents uncountably many models drawn from the approximate posterior and thus yields higher sample efficiency. Further, we show that the marginalization of a collapsed sample can be solved analytically and efficiently despite the non-linearity of neural networks by leveraging existing volume computation solvers. Our proposed use of collapsed samples achieves a balance between scalability and accuracy. On various regression and classification tasks, our collapsed Bayesian deep learning approach demonstrates significant improvements over existing methods and sets a new state of the art in terms of uncertainty estimation as well as predictive performance.

Collapsed Inference for Bayesian Deep Learning

TL;DR

Abstract

Paper Structure (20 sections, 1 theorem, 20 equations, 5 figures, 10 tables, 1 algorithm)

This paper contains 20 sections, 1 theorem, 20 equations, 5 figures, 10 tables, 1 algorithm.

Introduction
Bayesian Model Averaging as Weighted Volume Computation
A Warm-Up Example
General Reduction of BMA to WVC
Approximating BMA by WMI
CIBER: Collapsed Inference for Bayesian Deep Learning via WMI
Approximation to Posteriors
Encoding into WMI Problems
Exact Integration in Collapsed BMA
Related Work
Experiments
Regression on Small and Large UCI Datasets
Image Classification
Conclusions And Future Work
Proofs
...and 5 more sections

Key Result

Proposition 7

Let the SMT formula $\Delta = \Delta_{\mathsf{ReLU}\xspace} \land \Delta_{\mathit{pos}\xspace} \land \Delta_{\mathit{pred}\xspace}$, and the set of weights $\Phi\xspace = \Phi\xspace_{\mathit{pos}\xspace} \cup \Phi\xspace_{\mathit{pred}\xspace}$ as defined in Section sec: probs smt encoding. Let the

Figures (5)

Figure 1: The integral surface of (a) the expected prediction in BMA, and (b) our proposed approximation. Both are highly non-convex and multi-modal. The z-axis is the weighted prediction $y\xspace~p(y\xspace \mid \boldsymbol{x}\xspace, \bm{w}\xspace)~p(\bm{w}\xspace \mid \mathcal{D}\xspace)$. Integration of (a) does not admit a closed-form solution, yet integration of (b) is a close approximation that can be solved exactly and efficiently by WMI solvers.
Figure 2: Uncertainty estimates for regression. The red line is the ground truth. The dark blue line shows the predictive mean. The shaded region is the $90\%$ confidence interval of the predictive distribution. For the same number of samples, (b) CIBER is closer than (a) small-sample HMC to (c) a highly accurate but slow HMC with a large number of samples. See the Appendix for details.
Figure 3: Posterior predictive distributions in Bayesian linear regression. The $y$-axis shows the absolute difference between an estimated predictive distribution $p(y \mid \mathbf{x})$ and the ground-truth predictive distribution $q(y \mid \mathbf{x})$. Shaded regions are the $95\%$ confidence interval. Best viewed in color.
Figure 4: KL divergence in Bayesian linear regression. The $x$-axis shows the number of samples the MC method uses for estimations, ranging from $50$ to $150$. The blue curve shows the MC method, and the green dashed curve shows CIBER using $50$ samples.
Figure :

Theorems & Definitions (8)

Definition 1: WVC
Example 2
Example 3
Example 4
Definition 5
Definition 6
Proposition 7
proof

Collapsed Inference for Bayesian Deep Learning

TL;DR

Abstract

Collapsed Inference for Bayesian Deep Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (8)