Metabolic cost of information processing in Poisson variational autoencoders

Hadi Vafaii; Jacob L. Yates

Metabolic cost of information processing in Poisson variational autoencoders

Hadi Vafaii, Jacob L. Yates

TL;DR

This work develops an energy-aware theory of computation by analyzing variational inference under Poisson latent variables. It shows that the Poisson KL term couples information rate to metabolic rate, yielding a 'silence is cheap' objective that promotes sparsity and reduces baseline firing, a property not shared by Gaussian VAEs. The authors derive closed-form, tractable expressions for reconstruction and KL terms, contrast P-VAE with a Gaussian-rectified variant (G-ReLU-VAE), and empirically demonstrate that increasing the KL weight $\beta$ meaningfully reduces metabolic cost and increases sparsity in the Poisson model while preserving reconstruction quality. The results establish Poisson VI as a principled foundation for energy-constrained computation, with implications for energy-aware algorithm design and neuromorphic hardware.

Abstract

Computation in biological systems is fundamentally energy-constrained, yet standard theories of computation treat energy as freely available. Here, we argue that variational free energy minimization under a Poisson assumption offers a principled path toward an energy-aware theory of computation. Our key observation is that the Kullback-Leibler (KL) divergence term in the Poisson free energy objective becomes proportional to the prior firing rates of model neurons, yielding an emergent metabolic cost term that penalizes high baseline activity. This structure couples an abstract information-theoretic quantity -- the *coding rate* -- to a concrete biophysical variable -- the *firing rate* -- which enables a trade-off between coding fidelity and energy expenditure. Such a coupling arises naturally in the Poisson variational autoencoder (P-VAE) -- a brain-inspired generative model that encodes inputs as discrete spike counts and recovers a spiking form of *sparse coding* as a special case -- but is absent from standard Gaussian VAEs. To demonstrate that this metabolic cost structure is unique to the Poisson formulation, we compare the P-VAE against Grelu-VAE, a Gaussian VAE with ReLU rectification applied to latent samples, which controls for the non-negativity constraint. Across a systematic sweep of the KL term weighting coefficient $β$ and latent dimensionality, we find that increasing $β$ monotonically increases sparsity and reduces average spiking activity in the P-VAE. In contrast, Grelu-VAE representations remain unchanged, confirming that the effect is specific to Poisson statistics rather than a byproduct of non-negative representations. These results establish Poisson variational inference as a promising foundation for a resource-constrained theory of computation.

Metabolic cost of information processing in Poisson variational autoencoders

TL;DR

meaningfully reduces metabolic cost and increases sparsity in the Poisson model while preserving reconstruction quality. The results establish Poisson VI as a principled foundation for energy-constrained computation, with implications for energy-aware algorithm design and neuromorphic hardware.

Abstract

and latent dimensionality, we find that increasing

monotonically increases sparsity and reduces average spiking activity in the P-VAE. In contrast, Grelu-VAE representations remain unchanged, confirming that the effect is specific to Poisson statistics rather than a byproduct of non-negative representations. These results establish Poisson variational inference as a promising foundation for a resource-constrained theory of computation.

Paper Structure (78 sections, 56 equations, 8 figures, 1 table)

This paper contains 78 sections, 56 equations, 8 figures, 1 table.

Introduction
Contributions.
Background
Notation.
Perception as inference.
Variational inference and the free energy objective.
Distinguishing the two KL terms in the free energy.
Three distributions, one F.
The latent representational form.
Poisson variational autoencoder (P-VAE).
Variational parameters in VAEs.
Inference with learned encoder neural networks.
Residual parameterization in the P-VAE.
A metabolic cost term emerges in the P-VAE, but not in G-VAE
Quadratic approximation.
...and 63 more sections

Figures (8)

Figure 1: Per-dimension cost contributions of the KL term for the $\mathcal{P}$-VAE (left) and $\mathcal{G}$-VAE (right), plotted as a function of the log-residual parameters ${\color{color_enc}{\delta u}}\xspace = \log {\color{color_enc}{\delta\lambda}}\xspace$ and ${\color{color_enc}{\delta v}}\xspace = \log {\color{color_enc}{\delta\sigma}}\xspace^2$, respectively. Solid curves show the exact nonlinearities $f$ (\ref{['eq:kl_poisson']}) and $g$ (\ref{['eq:kl_gaussian']}); dashed curves show their quadratic approximations. Both functions share a minimum of zero at the identity residual (${\color{color_enc}{\delta u}}\xspace = {\color{color_enc}{\delta v}}\xspace = 0$, i.e., posterior $=$ prior) and are locally quadratic, but $f$ grows superquadratically for positive deviations, imposing a stronger penalty on firing rate increases than decreases.
Figure 2: Approximate posterior distributions for a single latent dimension. Left: The $\mathcal{P}$-VAE posterior is a Poisson distribution over nonnegative integers (rate $= 2$). Right: The (unnormalized) $\mathcal{G}_{{\text{relu}}}$-VAE posterior (solid orange) is obtained by applying a ReLU to Gaussian samples (dashed gray, $\mu = 0$, $\sigma = 1$). Both models produce nonnegative latent representations, but through fundamentally different mechanisms: discrete counting statistics versus continuous rectification.
Figure 3: Mean (left) and variance (right) of $\mathrm{relu}(z)$ for $z \sim \mathcal{N}({\color{color_enc}{\mu}}\xspace, {\color{color_enc}{\sigma}}\xspace^2)$, plotted as a function of ${\color{color_enc}{\mu}}\xspace$ for several values of ${\color{color_enc}{\sigma}}\xspace$. The mean (\ref{['eq:grelu_moments_m']}) behaves similar to softplus (dashed), approaching the identity for ${\color{color_enc}{\mu}}\xspace \gg 0$ and vanishing for ${\color{color_enc}{\mu}}\xspace \ll 0$. The variance (\ref{['eq:grelu_moments_v']}) behaves as a sigmoid-like (but steeper) gating function, saturating at ${\color{color_enc}{\sigma}}\xspace^2$ for ${\color{color_enc}{\mu}}\xspace \gg 0$ and vanishing for ${\color{color_enc}{\mu}}\xspace \ll 0$. Crucially, rectification couples mean and variance: pushing ${\color{color_enc}{\mu}}\xspace \to -\infty$ suppresses both simultaneously, providing a mechanism for the $\mathcal{G}_{{\text{relu}}}$-VAE to achieve sparse, low-variance representations.
Figure 4: Metabolic cost (\ref{['eq:metric_mc']}) as a function of KL weighting $\beta$ for different latent dimensionalities. Left: $\mathcal{P}$-VAE shows dramatic reduction in metabolic cost with increasing $\beta$, spanning nearly two orders of magnitude. Right: $\mathcal{G}_{{\text{relu}}}$-VAE shows no systematic change, remaining near baseline across all $\beta$ values. This confirms that the metabolic cost structure is unique to Poisson.
Figure 5: Proportion of zeros (\ref{['eq:metric_pz']}) as a function of KL weighting $\beta$ and latent dimensionality $K$. Left: For $\mathcal{P}$-VAE, sparsity increases monotonically with $\beta$ across all $K$, with larger $K$ achieving higher sparsity at the same $\beta$. Right: For $\mathcal{G}_{{\text{relu}}}$-VAE, sparsity plateaus at $50\%$ for all $\beta \geqslant 1.0$, regardless of $K$. See \ref{['fig:pie']} for a pie chart visualization of the $K=512$ results.
...and 3 more figures

Metabolic cost of information processing in Poisson variational autoencoders

TL;DR

Abstract

Metabolic cost of information processing in Poisson variational autoencoders

Authors

TL;DR

Abstract

Table of Contents

Figures (8)