Subtractive Mixture Models via Squaring: Representation and Learning
Lorenzo Loconte, Aleksanteri M. Sladek, Stefan Mengel, Martin Trapp, Arno Solin, Nicolas Gillis, Antonio Vergari
TL;DR
This work introduces subtractive mixture models by squaring a linear combination of base components within probabilistic circuits, ensuring non-negativity and enabling tractable normalization. By embedding squared mixtures in tensorized, deep circuit architectures and enforcing structured-decomposability, the authors derive an efficient squaring procedure and stable inference techniques, yielding NPC2 models that can be exponentially more expressive than traditional additive MMs. They connect NPC2s to PSD models and Born machines, providing reductions and showing that squaring can substantially improve distribution estimation in both synthetic and real-world tasks, including GPT-2 distillation. Theoretical results include an exponential expressiveness separation and practical demonstrations across density estimation benchmarks, underscoring NPC2s as a versatile, scalable tool for tractable probabilistic modeling with negative parameters.
Abstract
Mixture models are traditionally represented and learned by adding several distributions as components. Allowing mixtures to subtract probability mass or density can drastically reduce the number of components needed to model complex distributions. However, learning such subtractive mixtures while ensuring they still encode a non-negative function is challenging. We investigate how to learn and perform inference on deep subtractive mixtures by squaring them. We do this in the framework of probabilistic circuits, which enable us to represent tensorized mixtures and generalize several other subtractive models. We theoretically prove that the class of squared circuits allowing subtractions can be exponentially more expressive than traditional additive mixtures; and, we empirically show this increased expressiveness on a series of real-world distribution estimation tasks.
