Table of Contents
Fetching ...

Confidence and second-order errors in cortical circuits

Arno Granier, Mihai A. Petrovici, Walter Senn, Katharina A. Wilmes

TL;DR

This work develops a normative, probabilistic framework for cortical processing that explicitly models confidence (inverse uncertainty) at each hierarchical level, with predictions μ_ℓ = W_ℓ r_{ℓ+1} and precision π_ℓ = A_ℓ r_{ℓ+1}. A global energy E = 1/2 ∑_ℓ ||e_ℓ||^2_{π_ℓ} − 1/2 ∑_ℓ log|π_ℓ| guides gradient-based neuronal dynamics that dynamically balance bottom-up and top-down information under uncertainty, incorporating second-order errors δ_ℓ = (π_ℓ^{-1} − e_ℓ^2)/2. The model yields learning rules for both mean and confidence weights, demonstrates Bayes-optimal integration, and proposes concrete circuit-level instantiations involving L6p, L3e, L3δ, and apical dendrites, with VIP/SST disinhibitory circuits implementing confidence modulation. These findings offer a principled link between predictive coding, attention-like gain control, and second-order error signals, providing testable predictions for cortical circuitry and potential relevance to neuropsychiatric theories of uncertainty weighting.

Abstract

Minimization of cortical prediction errors has been considered a key computational goal of the cerebral cortex underlying perception, action and learning. However, it is still unclear how the cortex should form and use information about uncertainty in this process. Here, we formally derive neural dynamics that minimize prediction errors under the assumption that cortical areas must not only predict the activity in other areas and sensory streams but also jointly project their confidence (inverse expected uncertainty) in their predictions. In the resulting neuronal dynamics, the integration of bottom-up and top-down cortical streams is dynamically modulated based on confidence in accordance with the Bayesian principle. Moreover, the theory predicts the existence of cortical second-order errors, comparing confidence and actual performance. These errors are propagated through the cortical hierarchy alongside classical prediction errors and are used to learn the weights of synapses responsible for formulating confidence. We propose a detailed mapping of the theory to cortical circuitry, discuss entailed functional interpretations and provide potential directions for experimental work.

Confidence and second-order errors in cortical circuits

TL;DR

This work develops a normative, probabilistic framework for cortical processing that explicitly models confidence (inverse uncertainty) at each hierarchical level, with predictions μ_ℓ = W_ℓ r_{ℓ+1} and precision π_ℓ = A_ℓ r_{ℓ+1}. A global energy E = 1/2 ∑_ℓ ||e_ℓ||^2_{π_ℓ} − 1/2 ∑_ℓ log|π_ℓ| guides gradient-based neuronal dynamics that dynamically balance bottom-up and top-down information under uncertainty, incorporating second-order errors δ_ℓ = (π_ℓ^{-1} − e_ℓ^2)/2. The model yields learning rules for both mean and confidence weights, demonstrates Bayes-optimal integration, and proposes concrete circuit-level instantiations involving L6p, L3e, L3δ, and apical dendrites, with VIP/SST disinhibitory circuits implementing confidence modulation. These findings offer a principled link between predictive coding, attention-like gain control, and second-order error signals, providing testable predictions for cortical circuitry and potential relevance to neuropsychiatric theories of uncertainty weighting.

Abstract

Minimization of cortical prediction errors has been considered a key computational goal of the cerebral cortex underlying perception, action and learning. However, it is still unclear how the cortex should form and use information about uncertainty in this process. Here, we formally derive neural dynamics that minimize prediction errors under the assumption that cortical areas must not only predict the activity in other areas and sensory streams but also jointly project their confidence (inverse expected uncertainty) in their predictions. In the resulting neuronal dynamics, the integration of bottom-up and top-down cortical streams is dynamically modulated based on confidence in accordance with the Bayesian principle. Moreover, the theory predicts the existence of cortical second-order errors, comparing confidence and actual performance. These errors are propagated through the cortical hierarchy alongside classical prediction errors and are used to learn the weights of synapses responsible for formulating confidence. We propose a detailed mapping of the theory to cortical circuitry, discuss entailed functional interpretations and provide potential directions for experimental work.
Paper Structure (25 sections, 40 equations, 10 figures)

This paper contains 25 sections, 40 equations, 10 figures.

Figures (10)

  • Figure 1: Predictive distributions in the cortical hierarchy. (a) Probabilistic model. Latent representations ($\bm{u_{\ell}}$) are organized in a strict generative hierarchy. (b) Predictions are Gaussian distributions. Both the mean ($\bm{\mu_{\ell}} = \bm{W_{\ell}r_{\ell+1}}$, first-order) and the confidence ($\bm{\pi_{\ell}} = \bm{A_{\ell}r_{\ell+1}}$, inverse variance, second-order) are functions of higher-level activity.
  • Figure 2: Adaptive balancing of cortical streams based on confidence. (a) Divisive weighting of errors by the confidence of top-down predictions about what the activity of a neuron should be (prior confidence, $\bm{\pi_{\ell}}^{-1}$). (b) Multiplicative weighting of errors by the confidence of predictions that a neuron makes about what the activity of other neurons should be (data confidence, $\bm{\pi_{\ell-1}}$).
  • Figure 3: Propagation of second-order errors for classification. (a) Second-order errors compare confidence and performance (magnitude of prediction errors). (b) A 2x2 network for binary classification. During learning, the $\mathrm{X}$ and $\mathrm{Y}$ data are sampled from one of the two class distributions, and the activity of neurons representing the class is clamped to the one-hot encoded correct class. Parameters ($\bm{W} ,\bm{A}$) are then learned following Eqs. \ref{['eq:Wdot']} and \ref{['eq:Adot']}. During inference, the activity of neurons representing the class follows neuronal dynamics (without top-down influence), and we read the selected class as the one corresponding to the most active neuron. Prediction error (first-order) propagation is omitted in the depiction. (c) Maximizing the likelihood of predictions leads to nonlinear classification in a single area. (di) Two different 2-dimensional binary classification tasks. The ellipse represents the true class distributions for the two classes. (dii) Classification with second-order error propagation. (diii) Classification without second-order error propagation. (e) Classification accuracy on the task presented in d, second column.
  • Figure 4: Cortical circuit for neuronal dynamics of inference (as described in Eq. \ref{['eq:udot']} and Eq. \ref{['eq:a']}). (a) Representations ($\bm{u_{\ell}}$) are held in the somatic membrane potential of L6p. Top-down synapses carrying predictions ($\bm{\mu_{\ell}}=\bm{W_{\ell}r_{\ell+1}}$) directly excite L6p at proximal dendrites. Bottom-up confidence-weighted prediction errors ($\bm{W_{\ell-1}}^T(\bm{\pi_{\ell-1}} \circ \bm{e_{\ell-1}})$) and second-order errors ($\bm{A_{\ell-1}}^T\bm{\delta_{\ell-1}}$) are integrated into total error ($\bm{a_{\ell}}$) in the distal dendrites of L6p as described in Eq. \ref{['eq:a']}. This total error is then weighted by the prior uncertainty ($\bm{\pi_{\ell}}^{-1}$) through divisive dendritic inhibition realized by deep SST-expressing interneurons. (b) Top-down predictions ($\bm{\mu_{\ell}}=\bm{W_{\ell}r_{\ell+1}}$) and local representations ($\bm{u_{\ell}}$) are compared in L3$e$. Confidence weighting is then realized through gain modulation of L3$e$ by the disinhibitory VIP-expressing and SST-expressing interneurons circuit. (c) L3$\delta$ compares top-down confidence and local squared prediction errors encoded in basket cells (BC) into re-weighted second-order errors.
  • Figure S2: Error-correcting synaptic learning. (a) In these simulations, we consider a higher area with $N_{\ell+1}$ neurons and a lower area with $N_{\ell}$ neurons. Specifically, here we take $N_{\ell+1}=N_{\ell}=100$. The activity vector in the higher area can take $N_c$ different values [$\bm{r_n},~n\!=\!1,\dots,N_c$], to each of which is associated a different mean [$\bm{\mu_n}$] and a different variance [$\bm{\sigma_n}^2$]. The activity in the lower area is then sampled from the Gaussian distribution with this mean and variance. Predictions [$\bm{Wr_i}$] and confidence estimates [$\bm{Ar_i}$] are formed from the higher-level representation and prediction errors [$\bm{e}=\bm{x}-\bm{Wr_i}$] and second-order errors [$\bm{\delta}=\bm{1}/\bm{Ar_i}-\bm{e}^2$] are computed and used to learn parameters [$\bm{W}$ and $\bm{A}$]. For simulations marked (random), higher-level representations are random binary vectors with an average of 50% of ones. For simulations marked (one-hot), higher-level representations are one-hot encoded. (b) Here we show that with the learning rule Eq. 4 the network correctly learns to estimate the means [$\bm{\mu_n},~n\!=\!1,\dots,N_c$] from higher-level activity [$\bm{r_n},~n\!=\!1,\dots,N_c$]. In these simulations we suppose that the confidence estimate is $1$. (c) Here we show that with the learning rule Eq. 5 the network correctly learns to estimate the confidences [$\bm{1}/\bm{\sigma_n}^2$] from higher-level activity [$\bm{r_n}$].
  • ...and 5 more figures