Table of Contents
Fetching ...

Sparks of cognitive flexibility: self-guided context inference for flexible stimulus-response mapping by attentional routing

Rowan P. Sommers, Sushrut Thorat, Daniel Anthes, Tim C. Kietzmann

TL;DR

This work addresses the challenge of rapid, context-driven remapping of stimulus–response mappings in vision-based tasks. It introduces the Wisconsin Neural Network (WiNN), which combines a frozen pretrained CNN backbone with a fast context-inference module that modulates attention, followed by slow updates to attention and readout only. WiNN demonstrates efficient rule inference, strong generalization to unseen and compositional rules with fewer examples, and the ability to perform rule inference solely via context updates once a foundation is established. These results suggest a path toward context-sensitive models that preserve learned representations while adapting quickly to complex, rule-based tasks.

Abstract

Flexible cognition demands discovering hidden rules to quickly adapt stimulus-response mappings. Standard neural networks struggle in such tasks requiring rapid, context-driven remapping. Recently, Hummos (2023) introduced a fast-and-slow learning algorithm to mitigate this shortcoming, but its scalability to complex, image-computable tasks was unclear. Here, we propose the Wisconsin Neural Network (WiNN), which extends Hummos' fast-and-slow learning to image-computable tasks demanding flexible rule-based behavior. WiNN employs a pretrained convolutional neural network for vision, coupled with an adjustable "context state" that guides attention to relevant features. If WiNN produces an incorrect response, it first iteratively updates its context state to refocus attention on task-relevant cues, then performs minimal parameter updates to attention and readout layers. This strategy preserves generalizable representations in the sensory and attention networks, reducing catastrophic forgetting. We evaluate WiNN on an image-based extension of the Wisconsin Card Sorting Task, revealing several markers of cognitive flexibility: (i) WiNN autonomously infers underlying rules, (ii) requires fewer examples to do so than control models reliant on large-scale parameter updates, (iii) can perform context-based rule inference solely via context-state adjustments-further enhanced by slow updates of attention and readout parameters, and (iv) generalizes to unseen compositional rules through context-state updates alone. By blending fast context inference with targeted attentional guidance, WiNN achieves "sparks" of flexibility. This approach offers a path toward context-sensitive models that retain knowledge while rapidly adapting to complex, rule-based tasks.

Sparks of cognitive flexibility: self-guided context inference for flexible stimulus-response mapping by attentional routing

TL;DR

This work addresses the challenge of rapid, context-driven remapping of stimulus–response mappings in vision-based tasks. It introduces the Wisconsin Neural Network (WiNN), which combines a frozen pretrained CNN backbone with a fast context-inference module that modulates attention, followed by slow updates to attention and readout only. WiNN demonstrates efficient rule inference, strong generalization to unseen and compositional rules with fewer examples, and the ability to perform rule inference solely via context updates once a foundation is established. These results suggest a path toward context-sensitive models that preserve learned representations while adapting quickly to complex, rule-based tasks.

Abstract

Flexible cognition demands discovering hidden rules to quickly adapt stimulus-response mappings. Standard neural networks struggle in such tasks requiring rapid, context-driven remapping. Recently, Hummos (2023) introduced a fast-and-slow learning algorithm to mitigate this shortcoming, but its scalability to complex, image-computable tasks was unclear. Here, we propose the Wisconsin Neural Network (WiNN), which extends Hummos' fast-and-slow learning to image-computable tasks demanding flexible rule-based behavior. WiNN employs a pretrained convolutional neural network for vision, coupled with an adjustable "context state" that guides attention to relevant features. If WiNN produces an incorrect response, it first iteratively updates its context state to refocus attention on task-relevant cues, then performs minimal parameter updates to attention and readout layers. This strategy preserves generalizable representations in the sensory and attention networks, reducing catastrophic forgetting. We evaluate WiNN on an image-based extension of the Wisconsin Card Sorting Task, revealing several markers of cognitive flexibility: (i) WiNN autonomously infers underlying rules, (ii) requires fewer examples to do so than control models reliant on large-scale parameter updates, (iii) can perform context-based rule inference solely via context-state adjustments-further enhanced by slow updates of attention and readout parameters, and (iv) generalizes to unseen compositional rules through context-state updates alone. By blending fast context inference with targeted attentional guidance, WiNN achieves "sparks" of flexibility. This approach offers a path toward context-sensitive models that retain knowledge while rapidly adapting to complex, rule-based tasks.

Paper Structure

This paper contains 26 sections, 4 figures.

Figures (4)

  • Figure 1: Setup. (A) During an experiment, images are presented in succession. For each image, the task is to decide whether it adheres to a hidden rule. The hidden rule switches after $800$ images, forming a "Block." (B) The Wisconsin Neural Network (WiNN) is built for flexible rule inference over complex image streams. A pretrained convolutional neural network (CNN) maps the image to a response that is modulated by the inferred context. The left panel illustrates how the context state $c$ modulates the $l^\mathrm{th}$ CNN layer. Whenever the CNN produces an error, $c$ is updated iteratively to adjust the attentional weights and remap the network’s response. Subsequently, or if the response is already correct, a single update is applied to the attention and readout parameters.
  • Figure 2: WiNN infers rules efficiently. (A) The learning dynamics of WiNN are shown during a part of an experiment. On the left, towards the end of a rule-block, WiNN is mostly producing correct responses and therefore mostly engaging in attention/readout parameter updates. A rule change elicits a large error, making WiNN engage its context inference loop, minimizing the loss and correcting the response, followed by a single parameter update step. (B) WiNN can infer rules better than the control models - it can better generalize the seen S-R mappings to unseen stimuli. (C) WiNN can infer rules faster than the control models - it needs to see fewer stimuli to be accurate on subsequent stimuli. (D) In sum, efficient rule inference is better in WiNN than in the CNN control models. (E) Unfreezing the backbone still allows efficient rule inference, however, as seen in the Appendix, Figure \ref{['fig:a1']} generalization to unseen rules is harder. On the other hand, pretraining the backbone is important for WiNN's efficient rule inference capability.
  • Figure 3: WiNN can infer rules solely with context state updates. (A) WiNN can infer seen rules solely with context state updates. This inference becomes better through the experiment. Ablation analyses suggest that the slow attention/readout learning is key to this ability - they "prepare the ground" for context-only inference. (B) WiNN can also infer unseen simple rules and compositions of seen rules, although not perfectly. This suggests that the learned attention/readout mappings to and from the backbone are general enough to extend to unseen rules. (C) The generalization to seen simple rules is not accompanied by an interpretable context state space - neither the latent factors (e.g. wall/shape) nor the color values (e.g. red/blue) are reflected in the context space structure. (D) The generalization to compositional rules is accompanied by compositional similarity in the context space - the compositional context state is more similar to the component context states than the non-component context states.
  • Figure 4: With the same analysis as in Figure \ref{['fig:3']}, we observe that the context-only inference capability is worse in the WiNN-variant where the pretrained backbone is unfrozen than in WiNN.