Poisson Variational Autoencoder
Hadi Vafaii, Dekel Galor, Jacob L. Yates
TL;DR
The Poisson Variational Autoencoder (P-VAE) addresses the gap between traditional continuous latent VAEs and the discrete, spike-based coding observed in biology by encoding inputs as discrete Poisson spike counts. It combines predictive coding with a Poisson latent space, introducing a Poisson reparameterization trick and deriving an ELBO where the KL term becomes a sparsity-promoting penalty, naturally linking to amortized sparse coding when the decoder is linear. Empirically, the P-VAE avoids posterior collapse, learns sparse, Gabor-like basis vectors comparable to sparse coding, and yields higher-dimensional input representations that improve downstream linear separability and sample efficiency (e.g., about 5x in MNIST classification with limited labels). These results suggest a brain-inspired, interpretable framework for sensory processing that enhances robustness and data efficiency, while highlighting areas for future exploration such as hierarchical architectures and refined inference methods. Mathematically, the model uses posterior rates $oldsymbol{r}oldsymbol{ hd}oldsymbol{ hd}(m{x})$ and prior rates $oldsymbol{r}$ with a loss component involving $f(y)=1-y+y\, ext{log}\,y$, yielding a KL term $ oldsymbol{r}\,oldsymbol{ hd} f(oldsymbol{ hd}(m{x}))$ that enforces sparsity, and, under a linear decoder, reduces to amortized sparse coding objectives with reconstruction $ orm{m{x}-oldsymbol{oldsymbol{oldsymbol{\
Abstract
Variational autoencoders (VAEs) employ Bayesian inference to interpret sensory inputs, mirroring processes that occur in primate vision across both ventral (Higgins et al., 2021) and dorsal (Vafaii et al., 2023) pathways. Despite their success, traditional VAEs rely on continuous latent variables, which deviates sharply from the discrete nature of biological neurons. Here, we developed the Poisson VAE (P-VAE), a novel architecture that combines principles of predictive coding with a VAE that encodes inputs into discrete spike counts. Combining Poisson-distributed latent variables with predictive coding introduces a metabolic cost term in the model loss function, suggesting a relationship with sparse coding which we verify empirically. Additionally, we analyze the geometry of learned representations, contrasting the P-VAE to alternative VAE models. We find that the P-VAE encodes its inputs in relatively higher dimensions, facilitating linear separability of categories in a downstream classification task with a much better (5x) sample efficiency. Our work provides an interpretable computational framework to study brain-like sensory processing and paves the way for a deeper understanding of perception as an inferential process.
