Table of Contents
Fetching ...

Variational Distributional Neuron

Yves Ruffenach

TL;DR

A proof of concept for a variational distributional neuron is proposed: a compute unit formulated as a VAE brick, explicitly carrying a prior, an amortized posterior and a local ELBO, which extends the contribution over time via autoregressive priors over the latent, per unit.

Abstract

We propose a proof of concept for a variational distributional neuron: a compute unit formulated as a VAE brick, explicitly carrying a prior, an amortized posterior and a local ELBO. The unit is no longer a deterministic scalar but a distribution: computing is no longer about propagating values, but about contracting a continuous space of possibilities under constraints. Each neuron parameterizes a posterior, propagates a reparameterized sample and is regularized by the KL term of a local ELBO - hence, the activation is distributional. This "contraction" becomes testable through local constraints and can be monitored via internal measures. The amount of contextual information carried by the unit, as well as the temporal persistence of this information, are locally tuned by distinct constraints. This proposal addresses a structural tension: in sequential generation, causality is predominantly organized in the symbolic space and, even when latents exist, they often remain auxiliary, while the effective dynamics are carried by a largely deterministic decoder. In parallel, probabilistic latent models capture factors of variation and uncertainty, but that uncertainty typically remains borne by global or parametric mechanisms, while units continue to propagate scalars - hence the pivot question: if uncertainty is intrinsic to computation, why does the compute unit not carry it explicitly? We therefore draw two axes: (i) the composition of probabilistic constraints, which must be made stable, interpretable and controllable; and (ii) granularity: if inference is a negotiation of distributions under constraints, should the primitive unit remain deterministic or become distributional? We analyze "collapse" modes and the conditions for a "living neuron", then extend the contribution over time via autoregressive priors over the latent, per unit.

Variational Distributional Neuron

TL;DR

A proof of concept for a variational distributional neuron is proposed: a compute unit formulated as a VAE brick, explicitly carrying a prior, an amortized posterior and a local ELBO, which extends the contribution over time via autoregressive priors over the latent, per unit.

Abstract

We propose a proof of concept for a variational distributional neuron: a compute unit formulated as a VAE brick, explicitly carrying a prior, an amortized posterior and a local ELBO. The unit is no longer a deterministic scalar but a distribution: computing is no longer about propagating values, but about contracting a continuous space of possibilities under constraints. Each neuron parameterizes a posterior, propagates a reparameterized sample and is regularized by the KL term of a local ELBO - hence, the activation is distributional. This "contraction" becomes testable through local constraints and can be monitored via internal measures. The amount of contextual information carried by the unit, as well as the temporal persistence of this information, are locally tuned by distinct constraints. This proposal addresses a structural tension: in sequential generation, causality is predominantly organized in the symbolic space and, even when latents exist, they often remain auxiliary, while the effective dynamics are carried by a largely deterministic decoder. In parallel, probabilistic latent models capture factors of variation and uncertainty, but that uncertainty typically remains borne by global or parametric mechanisms, while units continue to propagate scalars - hence the pivot question: if uncertainty is intrinsic to computation, why does the compute unit not carry it explicitly? We therefore draw two axes: (i) the composition of probabilistic constraints, which must be made stable, interpretable and controllable; and (ii) granularity: if inference is a negotiation of distributions under constraints, should the primitive unit remain deterministic or become distributional? We analyze "collapse" modes and the conditions for a "living neuron", then extend the contribution over time via autoregressive priors over the latent, per unit.
Paper Structure (95 sections, 13 equations, 7 figures, 6 tables)

This paper contains 95 sections, 13 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Micro vs macro dynamics. (A) Global AR-VAE: a vector latent state $z_t\in\mathbb{R}^K$ follows a state-space-like dynamics. (B) AR-VAE neuron network: local latent states $z_t^{(i)}$ evolve via local transitions (sparse graph) and are aggregated to produce the output.
  • Figure 2: Architecture of a VAE neuron (distributional neuron). Each unit implements a local step of variational inference: $x \mapsto q_\phi(z\!\mid\!x)=\mathcal{N}(\mu_\phi(x),\sigma_\phi(x)^2)$, sampling $z=\mu_\phi(x)+\sigma_\phi(x)\varepsilon$, $\varepsilon\sim\mathcal{N}(0,1)$, then emission $p_\theta(a\!\mid\!z,x)$. The prior $p(z)$ regularizes via $\mathrm{KL}(q_\phi(z\!\mid\!x)\,\|\,p(z))$, defining a local ELBO ("spring" toward the prior).
  • Figure 3: Classic deterministic neuron vs distributional neuron (VAE neuron). Left: deterministic scalar activation $a=w^\top x+b$. Right: local variational inference with internal latent $z$, amortized posterior $q_\phi(z\!\mid\!x)$, reparameterization, emission $p_\theta(a\!\mid\!z,x)$, and KL regularization. The deterministic neuron is recovered as the limit $\sigma_\phi(x)\to 0$.
  • Figure 4: LongHorizon (in=336, H=96): performance and control regimes. Test MSE (mean $\pm$ std, $n=5$) for three EVE variants: homeo (homeostasis enabled), projON (hard projection) and projOFF (no internal control). Internal controls define regimes and are not meant as a systematic MSE booster.
  • Figure 5: Global correlation: out vs MSE. Scatter plot over $N=60$ runs (z-score per dataset), showing the positive correlation between the out-of-band fraction (out) and test MSE.
  • ...and 2 more figures