Table of Contents
Fetching ...

Attention when you need

Lokesh Boominathan, Yizhou Chen, Matthew McGinley, Xaq Pitkow

Abstract

Being attentive to task-relevant features can improve task performance, but paying attention comes with its own metabolic cost. Therefore, strategic allocation of attention is crucial in performing the task efficiently. This work aims to understand this strategy. Recently, de Gee et al. conducted experiments involving mice performing an auditory sustained attention-value task. This task required the mice to exert attention to identify whether a high-order acoustic feature was present amid the noise. By varying the trial duration and reward magnitude, the task allows us to investigate how an agent should strategically deploy their attention to maximize their benefits and minimize their costs. In our work, we develop a reinforcement learning-based normative model of the mice to understand how it balances attention cost against its benefits. The model is such that at each moment the mice can choose between two levels of attention and decide when to take costly actions that could obtain rewards. Our model suggests that efficient use of attentional resources involves alternating blocks of high attention with blocks of low attention. In the extreme case where the agent disregards sensory input during low attention states, we see that high attention is used rhythmically. Our model provides evidence about how one should deploy attention as a function of task utility, signal statistics, and how attention affects sensory evidence.

Attention when you need

Abstract

Being attentive to task-relevant features can improve task performance, but paying attention comes with its own metabolic cost. Therefore, strategic allocation of attention is crucial in performing the task efficiently. This work aims to understand this strategy. Recently, de Gee et al. conducted experiments involving mice performing an auditory sustained attention-value task. This task required the mice to exert attention to identify whether a high-order acoustic feature was present amid the noise. By varying the trial duration and reward magnitude, the task allows us to investigate how an agent should strategically deploy their attention to maximize their benefits and minimize their costs. In our work, we develop a reinforcement learning-based normative model of the mice to understand how it balances attention cost against its benefits. The model is such that at each moment the mice can choose between two levels of attention and decide when to take costly actions that could obtain rewards. Our model suggests that efficient use of attentional resources involves alternating blocks of high attention with blocks of low attention. In the extreme case where the agent disregards sensory input during low attention states, we see that high attention is used rhythmically. Our model provides evidence about how one should deploy attention as a function of task utility, signal statistics, and how attention affects sensory evidence.
Paper Structure (13 sections, 7 figures)

This paper contains 13 sections, 7 figures.

Figures (7)

  • Figure 1: ( A) Experimental setup figure modified with permission, from de Gee et al.de2022strategic. Mice are subjected to noise and signal phases with inter-trial intervals (ITIs). Hits during the signal phase result in rewards, while false alarms during the noise phase lead to time-outs. The blocks alternate between low (2 $\mu$l) and high (12 $\mu$l) sugar water rewards. ( B) Behavioral data showing an increase in the fraction of hits and false alarms (FAs) and a decrease in reaction times (RTs) as the subjective value of the food reward increases across blocks.
  • Figure 2: Formal task structure. ( A) In this POMDP, the agent constructs beliefs $b_t$ that reflect inferences about the latent world state using observations $o_t$ controlled by the 'attend' action $a_t$. The agent uses these beliefs to plan 'lick' actions that may obtain rewards. ( B) Specific world states and transitions in the finite state machine modeling this task. We choose $p=0.024$ so signal begins after an exponentially distributed time interval with mean of 5 seconds. The signal lasts throughout a sequence of $N=25$ states incremented each time step, lasting a total of 3 seconds. Lick actions produce rewards when taken during the signal period, and otherwise lead to a false alarm and penalty.
  • Figure 3: Illustration of the agent's interaction with the simulated trial. ( A) The top row indicates if the trial is in noise (white) or signal phase (black). The second row indicates the agent's observations, the hatched patterns indicating those obtained through low attention. The color indicates if the agent received a $0$ (white) or a $1$ (black) as the observation. The third row shows the agent's likelihood over the hidden states given the history, called the belief vector. Lick action is denoted by the small orange triangle. ( B) Grey curve shows the progression of the agent's belief in being in the signal phase (signal belief) with time. The blue and orange curves show the consequent attention and lick choice probabilities. White and black dots indicate $0$ and $1$ observations respectively. During low attention instances, the change in signal belief is more gradual (slow ramping shown). In contrast, high attention driven observations lead to steep upward/downward jumps in signal beliefs depending on if observations favor signal or noise respectively.
  • Figure 4: Trained agent shows ( A) an increase in the fraction of hits and false alarms, and ( B) a decrease in reaction times, as the food reward increases (similar to experimental data Fig \ref{['exp_data']}B). ( C) The agent pays more attention in a trial as food reward increases. ( D) The agent's policy distribution is plotted as a function of the agent's belief in being in the signal phase. The attention choice probability gradually increases with signal belief increasing from low to high values. Whereas, the lick choice probability stays flat at $0$ until a certain signal belief, after which it steeply increases. Change in ( E) attention and ( F) lick choice probabilities for increasing food rewards. Change in the ( G) autocorrelation of attention sequences for varying food rewards and the corresponding raster plot of attention times. ( H) The autocorrelation and raster plot for changes in signal duration.
  • Figure 5: ( A) Auto-correlation of attention sequences as we increase the low attention certainty from $0.5$. We see that the side peaks start to diminish for increasing low attention certainty. ( B) The distribution of wait times as a function of the signal belief and varying food reward sizes. The corresponding minimum wait times are shown in ( C). ( D) The number of high-attention instances in a trial decrease with increasing low-attention certainty.
  • ...and 2 more figures