Variational Inference for Quantum HyperNetworks
Luca Nepote, Alix Lhéritier, Nicolas Bondoux, Marios Kountouris, Maurizio Filippone
TL;DR
This work links Quantum HyperNetworks with Bayesian inference by deriving an explicit ELBO and a surrogate SELBO to train binary-weight networks via variational principles. By mapping BiNN weights to quantum circuit outcomes and using either full distribution access or implicit-sample distributions, the approach provides principled regularization that improves trainability and generalization over standard MLE. Empirical results on simple toy datasets show SELBO can yield higher accuracy and smoother optimization, suggesting practical benefits for quantum-inspired training of low-precision networks. The framework sets the stage for future hardware validation, scalability, and exploration of alternative divergences in quantum variational inference.
Abstract
Binary Neural Networks (BiNNs), which employ single-bit precision weights, have emerged as a promising solution to reduce memory usage and power consumption while maintaining competitive performance in large-scale systems. However, training BiNNs remains a significant challenge due to the limitations of conventional training algorithms. Quantum HyperNetworks offer a novel paradigm for enhancing the optimization of BiNN by leveraging quantum computing. Specifically, a Variational Quantum Algorithm is employed to generate binary weights through quantum circuit measurements, while key quantum phenomena such as superposition and entanglement facilitate the exploration of a broader solution space. In this work, we establish a connection between this approach and Bayesian inference by deriving the Evidence Lower Bound (ELBO), when direct access to the output distribution is available (i.e., in simulations), and introducing a surrogate ELBO based on the Maximum Mean Discrepancy (MMD) metric for scenarios involving implicit distributions, as commonly encountered in practice. Our experimental results demonstrate that the proposed methods outperform standard Maximum Likelihood Estimation (MLE), improving trainability and generalization.
