Table of Contents
Fetching ...

Setting up for failure: automatic discovery of the neural mechanisms of cognitive errors

Puria Radmard, Paul M. Bays, Máté Lengyel

TL;DR

The study tackles the challenge of uncovering neural mechanisms by automatically training biologically plausible RNNs to reproduce full, multimodal behavioural distributions, including suboptimalities like swap errors. It combines synthetic data generation via a Bayesian non-parametric model with diffusion-model-inspired training to produce realistic response distributions, and applies this to a visual working memory task. The results show the trained networks exhibit neural dynamics and representational geometry that resemble macaque prefrontal cortex activity and provide testable predictions about swap-error mechanisms. This approach demonstrates a principled route to automatic mechanism discovery by aligning neural dynamics with rich behavioural statistics rather than optimizing for task performance alone.

Abstract

Discovering the neural mechanisms underpinning cognition is one of the grand challenges of neuroscience. However, previous approaches for building models of RNN dynamics that explain behaviour required iterative refinement of architectures and/or optimisation objectives, resulting in a piecemeal, and mostly heuristic, human-in-the-loop process. Here, we offer an alternative approach that automates the discovery of viable RNN mechanisms by explicitly training RNNs to reproduce behaviour, including the same characteristic errors and suboptimalities, that humans and animals produce in a cognitive task. Achieving this required two main innovations. First, as the amount of behavioural data that can be collected in experiments is often too limited to train RNNs, we use a non-parametric generative model of behavioural responses to produce surrogate data for training RNNs. Second, to capture all relevant statistical aspects of the data, we developed a novel diffusion model-based approach for training RNNs. To showcase the potential of our approach, we chose a visual working memory task as our test-bed, as behaviour in this task is well known to produce response distributions that are patently multimodal (due to swap errors). The resulting network dynamics correctly qualitative features of macaque neural data. Importantly, these results were not possible to obtain with more traditional approaches, i.e., when only a limited set of behavioural signatures (rather than the full richness of behavioural response distributions) were fitted, or when RNNs were trained for task optimality (instead of reproducing behaviour). Our approach also yields novel predictions about the mechanism of swap errors, which can be readily tested in experiments. These results suggest that fitting RNNs to rich patterns of behaviour provides a powerful way to automatically discover mechanisms of important cognitive functions.

Setting up for failure: automatic discovery of the neural mechanisms of cognitive errors

TL;DR

The study tackles the challenge of uncovering neural mechanisms by automatically training biologically plausible RNNs to reproduce full, multimodal behavioural distributions, including suboptimalities like swap errors. It combines synthetic data generation via a Bayesian non-parametric model with diffusion-model-inspired training to produce realistic response distributions, and applies this to a visual working memory task. The results show the trained networks exhibit neural dynamics and representational geometry that resemble macaque prefrontal cortex activity and provide testable predictions about swap-error mechanisms. This approach demonstrates a principled route to automatic mechanism discovery by aligning neural dynamics with rich behavioural statistics rather than optimizing for task performance alone.

Abstract

Discovering the neural mechanisms underpinning cognition is one of the grand challenges of neuroscience. However, previous approaches for building models of RNN dynamics that explain behaviour required iterative refinement of architectures and/or optimisation objectives, resulting in a piecemeal, and mostly heuristic, human-in-the-loop process. Here, we offer an alternative approach that automates the discovery of viable RNN mechanisms by explicitly training RNNs to reproduce behaviour, including the same characteristic errors and suboptimalities, that humans and animals produce in a cognitive task. Achieving this required two main innovations. First, as the amount of behavioural data that can be collected in experiments is often too limited to train RNNs, we use a non-parametric generative model of behavioural responses to produce surrogate data for training RNNs. Second, to capture all relevant statistical aspects of the data, we developed a novel diffusion model-based approach for training RNNs. To showcase the potential of our approach, we chose a visual working memory task as our test-bed, as behaviour in this task is well known to produce response distributions that are patently multimodal (due to swap errors). The resulting network dynamics correctly qualitative features of macaque neural data. Importantly, these results were not possible to obtain with more traditional approaches, i.e., when only a limited set of behavioural signatures (rather than the full richness of behavioural response distributions) were fitted, or when RNNs were trained for task optimality (instead of reproducing behaviour). Our approach also yields novel predictions about the mechanism of swap errors, which can be readily tested in experiments. These results suggest that fitting RNNs to rich patterns of behaviour provides a powerful way to automatically discover mechanisms of important cognitive functions.

Paper Structure

This paper contains 27 sections, 14 equations, 15 figures, 2 tables, 2 algorithms.

Figures (15)

  • Figure 1: Left: typical procedure involves training an RNN to perform optimally in a task, without relating to real behaviour. Right: our novel method can replicate https://www.svgrepo.com/svg/72023/monkey behaviour, by training on surrogate training data.
  • Figure 2: Two-item delayed estimation task - minimal VWM task for swap errors.
  • Figure 3: DDPM-style trained RNNs accurately captures swap errors in training data, unlike ablated task-optimal networks. A dotted lines are the target swap rate used to generate the synthetic training data for various RNNs, and solid lines are average swap rates inferred from fitting BNS to RNN-generated behaviour after they are trained on this synthetic data (Appendix \ref{['app:bns']}; Radmard2025). RNNs achieve fair success in replicating training data for most trials in both probe distance-dependent and -independent cases. Corresponding sample sets in $\mathbb{R}^2$ are shown in C, where border colours match the line colours here.B Trajectory snapshots at different points in the second delay period, during which the network is trained to denoise behaviour. Trials are coloured by their argument at the end of denoising, which is interpreted as the colour estimation made by the network. Final samples successfully sample from their multimodal target distribution (Appendix \ref{['app:bns']}). C Typical set of final timestep samples for trials with close (left) and far (right) probe feature values. In all cases the red stimulus is cued. Borders indicate generative RNNs, referencing lines in A. Top row (grey border) indicates task-optimal network. D,E Two typical ablations that may be applied to task optimal networks to indcue swap errors: decreasing probe distance beyond the minimum margin between items seen during training to induce confusion, and increasing process noise variance beyond training conditions to increase misselection likelihood. Swap errors would be evidenced by a second mode at the colour of the distractor, indicated by a blue marker. This does not arise in either case, and further attempts would require subjective ablation design. E inset: Conversely, removing process noise for a network trained to perform swap errors recapitulates optimal behaviour.
  • Figure 4: Behaviourally realistic, but not task optimal, networks capture neural signatures of VWM. A$\boldsymbol{m}_T$ averaged over many accurate trials with distractor report feature varying, for trials with close (top) and far (bottom) stimulus distance, before (left, planes misaligned) and after (right, planes aligned) cue exposure. Colours represent report feature of cued (and accurately recalled) stimulus. Representations drawn from network in E (right), which best matches neural data (see below). B This qualitatively matches equivalent representations in macaque cortex - in this case there are only two possible cue values. We find a similar pre- and post-cue geometry for our index-cued networks, which are the closest analogue to the real task.C The real data (dots) further shows that this planar alignment (quantified as cosine similarity between normals) increases before and during cue exposure (grey stripe), and remains high throughout the second delay until time of recall. This is matched for our index cued networks when trained with a data-matched proportion of swap errors (5-10%). Even training with our DDPM-based method with a no-swap target distribution does not achieve this rapid increase during the cue period. A longer second delay period was chosen to encompass all experimental conditions in the original study.D Plane alignment over time for the feature-cued networks. Axes borders indicate the model used, as in Figure \ref{['fig:behavioural_reproduction']} . Importantly, only the model with distance-dependent swap probability matches the description of the biological data like its index-cued counterparts. This is quantified in Appendix \ref{['app:quant_sigmoid']}. Error bars show mean ± std across all different spatial configurations with the denoted stimulus location distance (see inset; purple is closest distractor, yellow is furthest). BC generated with data from Panichello2021
  • Figure 5: RNNs trained with swap errors provide a prediction about the geometry of item misselection. A prior works Alleman2024 suggest 3 causes of swap errors. Only misbinding errors arise during (or before) delay 1, prior to cueing. Misselection and misinterpretation cause incorrect transfer from a probe-report binding in delay 1 to a role-report binding in delay 2.B We repeat linear decoder analyses from Alleman2024 for our index-cued networks. Like the real data, these provide most evidence for swap errors arising during delay 2, ruling out misbinding errors. We defer full details to Alleman2024C Trial-by-trial $\boldsymbol{m}_T$, projected to a shared set of principal components (PCs). One item's colour is fixed to purple, and the other item's colour is varied and used to colour the scatterplots. Left (right) axes show when the purple item is cued (a distractor), so colours indicate the uncued (target) colour. Probe features are fixed to a close (far) spatial configuration in the top (bottom) row, causing more (fewer) swap errors. Neural activities form two rings (with a gap in each due to the minimum margin between colours) - see main text. To illustrate this better, we used a network that swaps more often than seen for macaques (highest orange line in Figure \ref{['fig:behavioural_reproduction']}A). D Left: there is high pairwise alignment in the third PC across different stimulus spatial configurations (i.e. fixed locations) if the fixed colour used to generate the two ring geometry is the same between these configurations - less so if the fixed colour is different (here, the opposite) for all pairs of spatial configurations. See Appendix \ref{['app:swaps']} for a more graded pairwise comparison. This also applies to the first two PCs, unshown. Right: this third PC discriminates swapped versus correct trials in cases where the fixed colour item (purple here) is cued, across many different probe values.
  • ...and 10 more figures