Gradient-free variational learning with conditional mixture networks
Conor Heins, Hao Wu, Dimitrije Markovic, Alexander Tschantz, Jeff Beck, Christopher Buckley
TL;DR
The paper addresses the challenge of obtaining calibrated predictions and uncertainty quantification without prohibitive computation in Bayesian neural-like models. It introduces CAVI-CMN, a gradient-free variational learning algorithm for training two-layer conditional mixture networks that exploits conditional conjugacy and Polya-Gamma augmentation to produce Gaussian posteriors and analytic updates. Empirically, CAVI-CMN matches or exceeds the predictive performance of gradient-based MLE while delivering full posterior distributions and calibrated predictions, with runtimes competitive with BBVI and NUTS and favorable scaling to larger models. This approach offers a practical, online-friendly Bayesian alternative for fast probabilistic networks and suggests extensions to deeper architectures and minibatch/streaming learning.
Abstract
Balancing computational efficiency with robust predictive performance is crucial in supervised learning, especially for critical applications. Standard deep learning models, while accurate and scalable, often lack probabilistic features like calibrated predictions and uncertainty quantification. Bayesian methods address these issues but can be computationally expensive as model and data complexity increase. Previous work shows that fast variational methods can reduce the compute requirements of Bayesian methods by eliminating the need for gradient computation or sampling, but are often limited to simple models. We introduce CAVI-CMN, a fast, gradient-free variational method for training conditional mixture networks (CMNs), a probabilistic variant of the mixture-of-experts (MoE) model. CMNs are composed of linear experts and a softmax gating network. By exploiting conditional conjugacy and Pólya-Gamma augmentation, we furnish Gaussian likelihoods for the weights of both the linear layers and the gating network. This enables efficient variational updates using coordinate ascent variational inference (CAVI), avoiding traditional gradient-based optimization. We validate this approach by training two-layer CMNs on standard classification benchmarks from the UCI repository. CAVI-CMN achieves competitive and often superior predictive accuracy compared to maximum likelihood estimation (MLE) with backpropagation, while maintaining competitive runtime and full posterior distributions over all model parameters. Moreover, as input size or the number of experts increases, computation time scales competitively with MLE and other gradient-based solutions like black-box variational inference (BBVI), making CAVI-CMN a promising tool for deep, fast, and gradient-free Bayesian networks.
