Table of Contents
Fetching ...

Gain modulation of actions selection without synaptic relearning

Elif Köksal-Ersöz, Pascal Chossat, Frédéric Lavigne

TL;DR

Problem: how to change goal selection after punishment without changing synaptic weights. Approach: gain modulation in a branching network of excitatory populations with inhibition, where learned patterns $\xi^k$ form a state-space and punishment reduces the gain $\gamma$ of active units, biasing transitions among branches without altering $J_{ij}$. Findings: punishment-induced gain changes reweight path probabilities, producing exploitation, exploration, or avoidance; effects persist into the next trial $T+1$ and can occur even when punishment targets distant units. Significance: demonstrates a complementary, fast-acting memory mechanism that encodes experience via $\gamma$ rather than synaptic changes, potentially reducing harmful errors while preserving learned structure.

Abstract

Adaptation of behavior requires the brain to change goals in a changing environment. Synaptic learning has demonstrated its effectiveness in changing the probability of selecting actions based on their outcome. In the extreme case, it is vital not to repeat an action to a given goal that led to harmful punishment. The present model proposes a simple neural mechanism of gain modulation that makes possible immediate changes in the probability of selecting a goal after punishment of variable intensity. Results show how gain modulation determine the type of elementary navigation process within the state space of a network of neuronal populations of excitatory neurons regulated by inhibition. Immediately after punishment, the system can avoid the punished populations by going back or by jumping to unpunished populations. This does not require particular credit assignment at the `choice' population but only gain modulation of neurons active at the time of punishment. Gain modulation does not require statistical relearning that may lead to further errors, but can encode memories of past experiences without modification of synaptic efficacies. Therefore, gain modulation can complements synaptic plasticity.

Gain modulation of actions selection without synaptic relearning

TL;DR

Problem: how to change goal selection after punishment without changing synaptic weights. Approach: gain modulation in a branching network of excitatory populations with inhibition, where learned patterns form a state-space and punishment reduces the gain of active units, biasing transitions among branches without altering . Findings: punishment-induced gain changes reweight path probabilities, producing exploitation, exploration, or avoidance; effects persist into the next trial and can occur even when punishment targets distant units. Significance: demonstrates a complementary, fast-acting memory mechanism that encodes experience via rather than synaptic changes, potentially reducing harmful errors while preserving learned structure.

Abstract

Adaptation of behavior requires the brain to change goals in a changing environment. Synaptic learning has demonstrated its effectiveness in changing the probability of selecting actions based on their outcome. In the extreme case, it is vital not to repeat an action to a given goal that led to harmful punishment. The present model proposes a simple neural mechanism of gain modulation that makes possible immediate changes in the probability of selecting a goal after punishment of variable intensity. Results show how gain modulation determine the type of elementary navigation process within the state space of a network of neuronal populations of excitatory neurons regulated by inhibition. Immediately after punishment, the system can avoid the punished populations by going back or by jumping to unpunished populations. This does not require particular credit assignment at the `choice' population but only gain modulation of neurons active at the time of punishment. Gain modulation does not require statistical relearning that may lead to further errors, but can encode memories of past experiences without modification of synaptic efficacies. Therefore, gain modulation can complements synaptic plasticity.
Paper Structure (3 sections, 2 equations, 3 figures)

This paper contains 3 sections, 2 equations, 3 figures.

Figures (3)

  • Figure 1: Branching behavior of a $N = 10$ units network at Trial $T$. (A) Network architecture of 10 units represented by numbered circles. Unit 4 is a branching node between three branches 0, 1 and 2. The synaptic efficacy between units 4 and 5 is 10% stronger than between units 4 and 8. (B) Network behavior is described from the activation of units 1 and 2 in the initial branch (Br-0). In this example, activation propagates from units 1 to 4 in a sequence, and (caption continues next page)
  • Figure 2: Branching behavior of a $N = 10$ units network (see Figure \ref{['fig1']}A-C) at Trial $T+1$. (A) Probability of activation of patterns in the 3 branches during the punished trial $T$, for all levels of punishment (1000 simulations for each level). In the absence of any punishment ('None'), the system takes branch 2 (Br-2) in 71% of the trials. For a weak punishment, the system activates branch 1 (Br-1) in 65% of trials and Br-1 in 20% of trials. Moderate punishment equalizes the probability between the two branches (41% for Br-1 and 39% for Br-2). For strong punishment, the networks activates the Br-2 only (0% for Br-1 vs 64% for Br-2). (B) Probability of activation of the patterns in the 3 branches at trial $T+1$ as a function of the level of punishment. Circles size is proportional to the probability of activation of the patterns. The effect of punishment at trial $T$ is maintained in the trial $T+1$. Strong punishment at trial prevents from activating the punished branch. Activation either goes back to Br-0 or jumps to Br-2.
  • Figure 3: Branching behavior of a $N = 12$ units network at Trial $T+1$. (A) Probability of activation of patterns in the 3 branches immediately after punishment during the punished trial $T$, for all levels of punishment (1000 simulations for each level). Punishment does not prevent from activating pattern D in branch 1 (Br-1) at trial $T$, due to the stronger synaptic connection with the branching node 4. (B) Probability of activation of the patterns in the 3 branches at trial $T+1$following the punished trial $T$, as a function of the level of punishment. Circles size is proportional to the probability of activation of the patterns. Increased punishment decreases the probability of activation of patterns E and F in Br-1 at trial $T+1$. Strong punishment at trial $T$ even prevents from activating the punished branch again at trial $T+1$. The network activates pattern D at the beginning of Br-1, but then either goes back to branch 0 (Br-0) or jumps to branch 2 (Br-2).