Table of Contents
Fetching ...

Derivation of the Variational Bayes Equations

Alianna J. Maren

TL;DR

This work analyzes how variational Bayes equations can be derived and translated between Beal's canonical formulation and Friston's active inference framework, with explicit attention to a Markov blanket separation between external and representational systems. It presents two equivalent expressions for the variational free energy $F$, namely $F = E_q[L(x)] - H[q(\psi|r)]$ and $F = L(s,a,r) + D_{KL}[q(\psi|r)||p(\psi|s,a,r)]$, and clarifies the roles of the log-likelihood $L$, entropy $H$, and the reverse KL divergence. The paper provides a Rosetta-stone mapping across Beal, Friston, and Blei nomenclatures and explains how integrating over the model space can be framed consistently within active inference. It then extends the framework to a computational engine using the 2-D Cluster Variation Method (CVM) to compute free-energy minima for both external and representational systems, enabling parameter learning via CVM enthalpy and entropy terms. Overall, the work lays out a scalable path for applying variational Bayes and active inference to multi-scale systems via a CVM-based computational engine, with implications for future neural and machine learning architectures.

Abstract

The derivation of key equations for the variational Bayes approach is well-known in certain circles. However, translating the fundamental derivations (e.g., as found in Beal's work) to Friston's notation is somewhat delicate. Further, the notion of using variational Bayes in the context of a system with a Markov blanket requires special attention. This Technical Report presents the derivation in detail. It further illustrates how the variational Bayes method provides a framework for a new computational engine, incorporating the 2-D cluster variation method (CVM), which provides a necessary free energy equation that can be minimized across both the external and representational systems' states, respectively.

Derivation of the Variational Bayes Equations

TL;DR

This work analyzes how variational Bayes equations can be derived and translated between Beal's canonical formulation and Friston's active inference framework, with explicit attention to a Markov blanket separation between external and representational systems. It presents two equivalent expressions for the variational free energy , namely and , and clarifies the roles of the log-likelihood , entropy , and the reverse KL divergence. The paper provides a Rosetta-stone mapping across Beal, Friston, and Blei nomenclatures and explains how integrating over the model space can be framed consistently within active inference. It then extends the framework to a computational engine using the 2-D Cluster Variation Method (CVM) to compute free-energy minima for both external and representational systems, enabling parameter learning via CVM enthalpy and entropy terms. Overall, the work lays out a scalable path for applying variational Bayes and active inference to multi-scale systems via a CVM-based computational engine, with implications for future neural and machine learning architectures.

Abstract

The derivation of key equations for the variational Bayes approach is well-known in certain circles. However, translating the fundamental derivations (e.g., as found in Beal's work) to Friston's notation is somewhat delicate. Further, the notion of using variational Bayes in the context of a system with a Markov blanket requires special attention. This Technical Report presents the derivation in detail. It further illustrates how the variational Bayes method provides a framework for a new computational engine, incorporating the 2-D cluster variation method (CVM), which provides a necessary free energy equation that can be minimized across both the external and representational systems' states, respectively.

Paper Structure

This paper contains 35 sections, 89 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Illustration of a CORTECON(R) (COntent-Retentive, TEMporally-CONnected neural network) computational engine (Maren, 2016) Maren_2016_CVM-primer-neurosci, which includes an internal latent node grid. Within this grid, the total number of active nodes is govered by an activation enthalpy ($\varepsilon_0$) and the degree of clustering is governed by an interaction enthalpy parameter ($\varepsilon_1$). The cluster variation method (CVM) is used to bring the active and non-active nodes into free energy equilibrium. A Markov blanket of sensing and active units corresponds to input and output layers (see Friston Friston-et-al_2015_Knowing-ones-place-free-energy-pattern-recognition). The latent node grid, or"computational layer," can be composed as either a 1-D or 2-D CVM, for which the free energy minimum can be found either analytically (for the case where $\varepsilon_0 = 0$) or computationally (for the case where $\varepsilon_0 \neq 0$). The CVM layer comprises the internal or representational units ($\tilde{r}$), and cannot communicate with the external field (shown in two parts for visualization purposes only). However, units within the representational layer can receive inputs from the sensory units ($\tilde{s}$) and send signals to the active ($\tilde{a}$) units. The sensory units can receive inputs from external stimulus, and send signals to the representational units. The active units can receive inputs from the representational units, and send signals to the external system.
  • Figure 2: Diagrammatic illustration of Eqn. \ref{['eqn:var-free-energy-eqn_part2-first-time']}.
  • Figure 3: In the variational Bayes method described by Friston, the external system, whose units are denoted by $\psi$, interacts with a separate representational system whose units are denoted by $r$. The two systems are separated by a Markov blanket composed of sensing ($s$) and action ($a$) units. (Note: for simplicity, the tilde notation is dropped from this figure.) In comparison with Beal, the distribution $q$ is of the external system, which is conditioned on the representational system; $q = q(\tilde{\psi}|\tilde{r})$. This is feasible because the external system units $\tilde{\psi}$ influence the representational units $\tilde{r}$ through the sensory units $\tilde{s}$. Conversely, the representational units $\tilde{r}$ influence the external units $\tilde{\psi}$ through the active units $\tilde{a}$.
  • Figure 4: Illustration of two systems, arranged so that a 2-D CVM-based free energy can be directly computed for each. (a) The external system, with units denoted $\tilde{\psi}$. (b) The representational system, showing only the representational units,($\tilde{r}$). The Markov blanket around the grid of representational units is not shown in this figure. The dark and light-shaded grey and mottled units to the upper and right edges of each system illustrate the wrap-around from the left and bottom edges, used to compute the configuration variables leading to the free energies of each system. Both systems show an approximate scale-free distribution of islands of dark (A) units in a sea of white (B) units. The systems are designed with equiprobable distribution of units into states A and B ($x_A = x_B = 0.5$), so that the (reduced) free energies of each can be computed directly, using the analytic solution provided in Maren Maren_2016_CVM-primer-neurosciAJMaren-2021-2D-CVM-Topography. Details of the corresponding thermodynamic calculations are found in Maren AJMaren_2019_Expt-Results_Two-epsilon-params. The systems shown in this figure have been hand-designed to illustrate a potential scale-free configuration; they have not yet been brought into free energy minimization.
  • Figure 5: (a) The external system $\tilde{\Psi}$ has been brought to a free energy minimum for the case where $h = 1.2$. Sampling this system provides different inputs to the representational system with units $\tilde{r}$. In reality, we would not directly know the h-values corresponding to $\tilde{\Psi}$. However, we would trust that the system $\tilde{r}$, taking its configuration values from sensing applied to the units $\tilde{\psi}$, would also be at equilibrium. Finding the h-values for $\tilde{r}$ would give us the parameters for the model $\tilde{p}$, shown in (b). In this particular case, as the full set of h-values corresponding to different configuration values still needs to be developed, the system $\tilde{p}$ was devised for illustration purposes by performing free energy minimization on $\tilde{r}$, shown in the previous Fig. \ref{['fig:CVM-2D_Scale-free_1024-and-scale-free-256-nodes_2019-06-18_full_first-fig_crppd']}, for $h=1.2$. The equilibrium configuration values for $\tilde{\Psi}$ and $\tilde{p}$ are shown in (c).
  • ...and 1 more figures