Table of Contents
Fetching ...

Neural Policy Composition from Free Energy Minimization

Francesca Rossi, Veronica Centorrino, Francesco Bullo, Giovanni Russo

TL;DR

GateMod is introduced, an interpretable theoretically grounded computational model linking the emergence of gating to the underlying decision-making task, and to a neural circuit architecture, offering a unifying framework for neural policy gating, linking task objectives, dynamical computation, and circuit-level mechanisms.

Abstract

The ability to compose acquired skills to plan and execute behaviors is a hallmark of natural intelligence. Yet, despite remarkable cross-disciplinary efforts, a principled account of how task structure shapes gating and how such computations could be delivered in neural circuits, remains elusive. Here we introduce GateMod, an interpretable theoretically grounded computational model linking the emergence of gating to the underlying decision-making task, and to a neural circuit architecture. We first develop GateFrame, a normative framework casting policy gating into the minimization of the free energy. This framework, relating gating rules to task, applies broadly across neuroscience, cognitive and computational sciences. We then derive GateFlow, a continuous-time energy based dynamics that provably converges to GateFrame optimal solution. Convergence, exponential and global, follows from a contractivity property that also yields robustness and other desirable properties. Finally, we derive a neural circuit from GateFlow, GateNet. This is a soft-competitive recurrent circuit whose components perform local and contextual computations consistent with known dendritic and neural processing motifs. We evaluate GateMod across two different settings: collective behaviors in multi-agent systems and human decision-making in multi-armed bandits. In all settings, GateMod provides interpretable mechanistic explanations of gating and quantitatively matches or outperforms established models. GateMod offers a unifying framework for neural policy gating, linking task objectives, dynamical computation, and circuit-level mechanisms. It provides a framework to understand gating in natural agents beyond current explanations and to equip machines with this ability.

Neural Policy Composition from Free Energy Minimization

TL;DR

GateMod is introduced, an interpretable theoretically grounded computational model linking the emergence of gating to the underlying decision-making task, and to a neural circuit architecture, offering a unifying framework for neural policy gating, linking task objectives, dynamical computation, and circuit-level mechanisms.

Abstract

The ability to compose acquired skills to plan and execute behaviors is a hallmark of natural intelligence. Yet, despite remarkable cross-disciplinary efforts, a principled account of how task structure shapes gating and how such computations could be delivered in neural circuits, remains elusive. Here we introduce GateMod, an interpretable theoretically grounded computational model linking the emergence of gating to the underlying decision-making task, and to a neural circuit architecture. We first develop GateFrame, a normative framework casting policy gating into the minimization of the free energy. This framework, relating gating rules to task, applies broadly across neuroscience, cognitive and computational sciences. We then derive GateFlow, a continuous-time energy based dynamics that provably converges to GateFrame optimal solution. Convergence, exponential and global, follows from a contractivity property that also yields robustness and other desirable properties. Finally, we derive a neural circuit from GateFlow, GateNet. This is a soft-competitive recurrent circuit whose components perform local and contextual computations consistent with known dendritic and neural processing motifs. We evaluate GateMod across two different settings: collective behaviors in multi-agent systems and human decision-making in multi-armed bandits. In all settings, GateMod provides interpretable mechanistic explanations of gating and quantitatively matches or outperforms established models. GateMod offers a unifying framework for neural policy gating, linking task objectives, dynamical computation, and circuit-level mechanisms. It provides a framework to understand gating in natural agents beyond current explanations and to equip machines with this ability.

Paper Structure

This paper contains 18 sections, 14 equations, 4 figures.

Figures (4)

  • Figure 1: GateMod Set-up. A At time step $k-1$, an agent (e.g., a boid in a flock, or a person in a multi-armed bandit task, or an autonomous agent) receives the state $\mathbf{x}_{k-1}$ from the environment and determines action $\mathbf{u}_k$. Both $\mathbf{x}_{k-1}$ and $\mathbf{u}_k$ are realizations of random variables, $\mathbf{X}_{k-1}$ and $\mathbf{U}_k$. We denote random variables with upper-case letters and their realizations with lower-case letters. Bold means that the variable is, in general, a vector. B At each time step, the agent computes the optimal policy $p^{\star}_{\mathbf{u}} \left(\mathbf{u}_{{k}}\mid \mathbf{x}_{{k-1}} \right)$ by combining a set of available primitives $\pi^{{1}}\left(\mathbf{u}_{{k}}\mid \mathbf{x}_{{k-1}} \right),\ldots, \pi^{{n_{\textup{$\pi$}}}}\left(\mathbf{u}_{{k}}\mid \mathbf{x}_{{k-1}} \right)$ via a gating mechanism. GateMod provides a normative framework (GateFrame) to optimally combine the weights, a continuous-time dynamics (GateFlow) to provably find the weights, and a neural circuit (GateNet) implementing this continuous-time solver. Intuitively, given a task -- formalized via a generative model -- and a model of the environment, GateMod computes the weights, $\mathbf{w}^{\star}_{k}$, to linearly combine the primitives.
  • Figure 2: GateMod. A GateFrame normative framework. At each time step, the agent computes optimal policy weights $\mathbf{w}_k^\star$ by solving an entropy-regularized optimization problem that minimizes a trade-off between statistical complexity and entropy. The constraints formalize the fact that the resulting policy is a linear, and hence convex, combination of primitives. The optimal weights correspond to the equilibrium of GateFlow: a continuous-time dynamical system defined by a softmax gradient flow. This is an energy model that provably converges to GateFrame optimal solution. B GateFlow is an energy model featuring highly ordered behaviors with guaranteed and explicit exponential converge rate to the optimal solution. GateFrame objective is an energy function for GateFlow so that the energy decreases along its trajectories towards the optimal solution (top). Converge is global (bottom): regardless of the initial conditions (initialization value for the weights) GateFlow trajectories converge to $\mathbf{w}^{\star}_{k}$. Convergence follows from a stronger contractivity property that also confers robustness and other desirable properties. C GateFlow admits a neural implementation. The architecture consists of two coupled modules operating at different timescales: a fast subsystem that computes the gradient of the objective using local operations (linear summation, logarithmic activation) and a slower subsystem that implements the softmax activation function featuring exponential and logarithmic activation functions. The fast unit features contextual computations that are based on the current state. These computations can be implemented via the Sigma-Pi model. The input to the fast unit is a cost combining a mismatch from the generative model and a log-likelihood. The result, aligned with literature on the distributional costs in biological neural circuits, is a vector associated to the action space rather than a single mean value.
  • Figure 3: A A boid in a flock of $N$ boids. Position and velocity components form $4$-dimensional state $\mathbf{x}_k^i$; $\mathbf{u}_k^i$ is the acceleration vector. We use the superscript to denote that states/actions are those of the $i$-th boid in the flock. The acceleration is built upon the social forces and a boid can only use information from boids within its field of view. The field angle, $\alpha$, is set to $\ang{320}$ in the experiments. The radii correspond to FC-SS:07HL-WJR-IC:00 three concentric separation, alignment and cohesion zones. B GateFrame optimization is solved via GateFlow. Starting from an initial feasible initial condition, GateFlow trajectories converge to GateFrame optimal solution. We recall that GateFlow is an energy model and, along its trajectories, the energy $\mathsf{F}-\varepsilon\mathsf{H}$ decreases. C GateMod recovers polarization. Trajectories of the $N=40$ boids from random initial conditions (left). The group exhibits polarization and this is confirmed in the middle panel, showing the evolution of the polarization order parameter -- the average normalized speed across boids -- over simulation time. A value of $1$ of this parameter indicates perfect alignment between boids. Right: time evolution of the optimal primitives' weights from GateFlow. The evolution is shown for a representative agent and reveals that -- after an initial transient when cohesion prevails -- the weights tend toward a nearly uniform distribution. The weights evolution for all the boids in the experiments are in Fig. S1 of Supplementary Information. See also Sec. 5 therein. GateMod can also recover milling when a collision-avoidance term is introduced in the generative model. See Fig. S2 in Supplementary Information. D Collective behavior of boids when the generative model of a few boids (10%) encodes a goal-directed behavior. The other boids have the same generative model from Fig. 3C. Trajectories of the boids from random initial positions (left) illustrate global convergence toward the goal; temporal evolution of group-level metrics (middle) reveals a transition from disordered movement to polarized, goal-directed behavior; time evolution of the optimal primitives' weights for a representative goal-informed boid and an uninformed one (right): the evolution of the informed boid's weights suggests an adaptive goal-directed behavior; the uninformed boid's weights, after an initial transient in which they get closer and aligned to the others, tend toward a uniform distribution. Findings are confirmed for different numbers of informed boids, different temperature values and different models for the followers. See Sec. 5 in Supplementary Information. Simulation parameters are in Tab. S1 of Supplementary Information.
  • Figure 4: A Comparison between Hybrid model from SJG:18 and GateMod in terms of PXP. Higher PXP for a given model suggests that the model provides better explanations for the data. Formally, PXP quantifies the probability that each considered model is the most frequent process that generated the data. To obtain the PXP, we start from GateMod optimal policy. The policy at each trial is used to compute the Bayesian Information Criterion (BIC) GS:78AAN-JEC:12 values. Then, these are submitted to hierarchical Bayesian model selection LR-KES-KJF-JD:14. B PXP comparison between UCB, Thompson, Value and GateMod. GateMod has robustly achieves the highest PXP across both experiments. C Evolution of the mean (bold line) and std (shaded area) of primitives' weights across subjects per trial. Weights show a pattern, suggesting that primitives might encode a mental schema adopted in similar ways by humans in the same context.