Table of Contents
Fetching ...

Emergence of hybrid computational dynamics through reinforcement learning

Roman A. Kononov, Nikita A. Pospelov, Konstantin V. Anokhin, Vladimir V. Nekorkin, Oleg V. Maslennikov

TL;DR

Reinforcement learning spontaneously discovers hybrid attractor architectures, combining stable fixed-point attractors for decision maintenance with quasi-periodic attractors for flexible evidence integration, and sculpts functionally balanced neural populations through a powerful form of implicit regularization.

Abstract

Understanding how learning algorithms shape the computational strategies that emerge in neural networks remains a fundamental challenge in machine intelligence. While network architectures receive extensive attention, the role of the learning paradigm itself in determining emergent dynamics remains largely unexplored. Here we demonstrate that reinforcement learning (RL) and supervised learning (SL) drive recurrent neural networks (RNNs) toward fundamentally different computational solutions when trained on identical decision-making tasks. Through systematic dynamical systems analysis, we reveal that RL spontaneously discovers hybrid attractor architectures, combining stable fixed-point attractors for decision maintenance with quasi-periodic attractors for flexible evidence integration. This contrasts sharply with SL, which converges almost exclusively to simpler fixed-point-only solutions. We further show that RL sculpts functionally balanced neural populations through a powerful form of implicit regularization -- a structural signature that enhances robustness and is conspicuously absent in the more heterogeneous solutions found by SL-trained networks. The prevalence of these complex dynamics in RL is controllably modulated by weight initialization and correlates strongly with performance gains, particularly as task complexity increases. Our results establish the learning algorithm as a primary determinant of emergent computation, revealing how reward-based optimization autonomously discovers sophisticated dynamical mechanisms that are less accessible to direct gradient-based optimization. These findings provide both mechanistic insights into neural computation and actionable principles for designing adaptive AI systems.

Emergence of hybrid computational dynamics through reinforcement learning

TL;DR

Reinforcement learning spontaneously discovers hybrid attractor architectures, combining stable fixed-point attractors for decision maintenance with quasi-periodic attractors for flexible evidence integration, and sculpts functionally balanced neural populations through a powerful form of implicit regularization.

Abstract

Understanding how learning algorithms shape the computational strategies that emerge in neural networks remains a fundamental challenge in machine intelligence. While network architectures receive extensive attention, the role of the learning paradigm itself in determining emergent dynamics remains largely unexplored. Here we demonstrate that reinforcement learning (RL) and supervised learning (SL) drive recurrent neural networks (RNNs) toward fundamentally different computational solutions when trained on identical decision-making tasks. Through systematic dynamical systems analysis, we reveal that RL spontaneously discovers hybrid attractor architectures, combining stable fixed-point attractors for decision maintenance with quasi-periodic attractors for flexible evidence integration. This contrasts sharply with SL, which converges almost exclusively to simpler fixed-point-only solutions. We further show that RL sculpts functionally balanced neural populations through a powerful form of implicit regularization -- a structural signature that enhances robustness and is conspicuously absent in the more heterogeneous solutions found by SL-trained networks. The prevalence of these complex dynamics in RL is controllably modulated by weight initialization and correlates strongly with performance gains, particularly as task complexity increases. Our results establish the learning algorithm as a primary determinant of emergent computation, revealing how reward-based optimization autonomously discovers sophisticated dynamical mechanisms that are less accessible to direct gradient-based optimization. These findings provide both mechanistic insights into neural computation and actionable principles for designing adaptive AI systems.

Paper Structure

This paper contains 24 sections, 5 equations, 11 figures, 1 table.

Figures (11)

  • Figure 1: Task structure and emergent architecture of a recurrent neural network trained with reinforcement learning. (a) Temporal structure of a single trial, showing the distinct fixation, stimulus, delay, and decision stages. (b) Schematic of the vanilla RNN architecture. (c) Time-resolved output probabilities from a trained network, illustrating the formation of a decision. (d) The first principal component of hidden layer activity, capturing the dominant dynamical mode during a trial. (e) The network's decision boundary in the two-dimensional coherence space, demonstrating successful context-dependent choices. (f) Raster plot of the full hidden layer activity, revealing complex population-level dynamics. (g) Recurrent weight matrix of the trained network, reorganized according to functionally distinct neural populations identified in our analysis. The structured connectivity, featuring strong intra-population excitation and inter-population inhibition, was not pre-specified but emerged entirely through reward-driven learning.
  • Figure 2: Reinforcement learning discovers a richer dynamical landscape than supervised learning. (a–d) Probability of a stimulus-encoding attractor being quasi-periodic (oscillatory) across the coherence parameter space. The learning algorithm fundamentally alters the solution type: reinforcement learning (a, b) consistently finds oscillatory solutions, particularly for ambiguous, low-coherence stimuli, while supervised learning (c, d) overwhelmingly converges to stable fixed points. This divergence is amplified by broader weight initializations (b, d). (e–h) Aggregated statistics confirm the prevalence of quasi-periodic attractors is significantly higher in RL than SL across all initialization widths ($\delta$) and for both context-dependent (e, f) and simple (g, h) tasks. These results establish that the learning rule is a primary determinant of the emergent computational strategy, with RL's exploratory nature discovering complex dynamics that SL actively prunes away.
  • Figure 3: Emergence of a hybrid attractor architecture during reinforcement learning. (a, b) Low-dimensional projections of neural trajectories for trials with opposite primary coherence, showing distinct paths leading to different choices. (c–e) Progressive sculpting of the state space across three stages of training. The network learns to separate computational functions into distinct dynamical regimes. Decisions are represented by two discrete, stable fixed-point attractors (blue and red stars), while sensory evidence is encoded along a continuous manifold (colored surface). An untrained network (c) has an unstructured state space. As training progresses (d), the manifold elongates to separate evidence. The fully trained network (e) exhibits a complete, spontaneous separation of a working memory circuit (the encoding manifold) from a decision circuit (the bistable fixed points), forming a robust hybrid computational system.
  • Figure 4: Population self-organization is coupled to performance gains and bistability. (a, e) Learning curves showing accuracy improvements during training. (b, f, j) Four functionally distinct neural populations emerge during RL, defined by their selective participation in attractor dynamics. Notably, the opposing coherence-selective populations ($\mathrm{G}_+, \mathrm{G}_-$) are sculpted to be approximately equal in size. (c, g, k) The emergence of a bistable decision landscape, allowing for two distinct choices. (d, h, l) Statistical analysis reveals that the jump in task accuracy from the 0.5 plateau is tightly correlated with both the formation of specialized populations and the emergence of decision bistability. This demonstrates a direct link between the network's structural organization, its dynamical capabilities, and its functional performance. (i) A schematic illustrating the bifurcation process leading to a bistable decision landscape.
  • Figure 5: Dynamically-defined populations exhibit distinct information encoding roles. (a, b) Mutual information (MI) between single-neuron activity and the sign of primary coherence. Neurons are ordered according to the population structure identified in Fig. \ref{['fig:second']} (shown in panel c). (c) The four functional populations ($\mathrm{G}_s, \mathrm{G}_+, \mathrm{G}_-, \mathrm{G}_a$) identified via dynamical systems analysis. (d, e) Example single-neuron activity traces demonstrating selective responses. The MI analysis validates the functional roles of the populations: neurons in $\mathrm{G}_+$ selectively encode positive coherence, while those in $\mathrm{G}_-$ encode negative coherence. This reveals a clear division of labor and confirms that the emergent population structure forms the basis for reliable information processing.
  • ...and 6 more figures