Table of Contents
Fetching ...

Emergence of Functionally Differentiated Structures via Mutual Information Minimization in Recurrent Neural Networks

Yuki Tomoda, Ichiro Tsuda, Yutaka Yamaguti

TL;DR

This work investigates how functional differentiation and modularity emerge in recurrent neural networks under a global information-theoretic constraint. By minimizing mutual information between two predefined neural subgroups using a MINE-based estimator, the authors demonstrate that functional modularity arises early in learning while structural modularity develops more gradually, across two tasks: a 2-bit working memory benchmark and chaotic signal separation of Lorenz and Rössler dynamics. Across both tasks, networks achieve high task performance with distinct functional modules; output and input weight specializations align with the functional roles, and functional separation often precedes structural reorganization. The findings offer a principled perspective on how information-theoretic constraints can shape brain-like differentiation and modularity, with implications for biological understanding and the design of modular, multi-task AI systems, especially when guided by sparsity constraints to translate function into structure.

Abstract

Functional differentiation in the brain emerges as distinct regions specialize and is key to understanding brain function as a complex system. Previous research has modeled this process using artificial neural networks with specific constraints. Here, we propose a novel approach that induces functional differentiation in recurrent neural networks by minimizing mutual information between neural subgroups via mutual information neural estimation. We apply our method to a 2-bit working memory task and a chaotic signal separation task involving Lorenz and Rössler time series. Analysis of network performance, correlation patterns, and weight matrices reveals that mutual information minimization yields high task performance alongside clear functional modularity and moderate structural modularity. Importantly, our results show that functional differentiation, which is measured through correlation structures, emerges earlier than structural modularity defined by synaptic weights. This suggests that functional specialization precedes and probably drives structural reorganization within developing neural networks. Our findings provide new insights into how information-theoretic principles may govern the emergence of specialized functions and modular structures during artificial and biological brain development.

Emergence of Functionally Differentiated Structures via Mutual Information Minimization in Recurrent Neural Networks

TL;DR

This work investigates how functional differentiation and modularity emerge in recurrent neural networks under a global information-theoretic constraint. By minimizing mutual information between two predefined neural subgroups using a MINE-based estimator, the authors demonstrate that functional modularity arises early in learning while structural modularity develops more gradually, across two tasks: a 2-bit working memory benchmark and chaotic signal separation of Lorenz and Rössler dynamics. Across both tasks, networks achieve high task performance with distinct functional modules; output and input weight specializations align with the functional roles, and functional separation often precedes structural reorganization. The findings offer a principled perspective on how information-theoretic constraints can shape brain-like differentiation and modularity, with implications for biological understanding and the design of modular, multi-task AI systems, especially when guided by sparsity constraints to translate function into structure.

Abstract

Functional differentiation in the brain emerges as distinct regions specialize and is key to understanding brain function as a complex system. Previous research has modeled this process using artificial neural networks with specific constraints. Here, we propose a novel approach that induces functional differentiation in recurrent neural networks by minimizing mutual information between neural subgroups via mutual information neural estimation. We apply our method to a 2-bit working memory task and a chaotic signal separation task involving Lorenz and Rössler time series. Analysis of network performance, correlation patterns, and weight matrices reveals that mutual information minimization yields high task performance alongside clear functional modularity and moderate structural modularity. Importantly, our results show that functional differentiation, which is measured through correlation structures, emerges earlier than structural modularity defined by synaptic weights. This suggests that functional specialization precedes and probably drives structural reorganization within developing neural networks. Our findings provide new insights into how information-theoretic principles may govern the emergence of specialized functions and modular structures during artificial and biological brain development.

Paper Structure

This paper contains 50 sections, 23 equations, 16 figures.

Figures (16)

  • Figure 1: Schematics of the two experimental tasks. (A) Working memory task. The network receives brief pulses on four input channels (ON$_1$, OFF$_1$, ON$_2$, OFF$_2$) and must maintain two independent memory bits (output values of $\pm 1.0$) based on the most recent pulse received. (B) Chaotic signal separation task. The network receives a mixed 3-dimensional input signal from Lorenz and Rössler chaotic systems and must separate it into two distinct 3-dimensional output signals
  • Figure 2: Schematic of the hidden state update in the two RNN architectures used in this study. (A) Leaky-integrator RNN and (B) GRU. $\sigma$ is the sigmoid activation function and $\tanh$ is the hyperbolic tangent activation function. Weight parameters to be learned are represented by shaded circles
  • Figure 3: Training process of the MM and SM. Training phases A and B are repeated alternately. BP indicates backpropagation. Horizontal hatching patterns indicate which network is being optimized. (A) SM training. Maximizes MI estimate between neural subgroups using MINE. (B) MM training. Minimizes total loss including task performance and MI between the two neural subgroups
  • Figure 4: Dynamics of the working memory task. The four panels, from top to bottom, show: (A) input pulses for $\text{ON}_1$ and $\text{OFF}_{1}$ signals, (B) network output (solid) compared with target signal (dashed) for memory bit 1, (C) input pulses for $\text{ON}_{2}$ and $\text{OFF}_2$ signals, and (D) network output (solid) compared with target signal (dashed) for memory bit 2. In (B) and (D), the first 1,000 time steps are a transient period during which the outputs are not evaluated for the task loss calculation
  • Figure 5: Correlation between neural activity and output signals in the working memory task. Each point represents a neuron, with coordinates determined by its absolute correlation with output signal 1 ($|c_i^{(1)}|$) and output signal 2 ($|c_i^{(2)}|$). (A) Results with MI minimization. (B) Results without MI minimization. Neurons in group 1 are represented by circles and those in group 2 by triangles
  • ...and 11 more figures