Table of Contents
Fetching ...

Learning to Chain Operations by Routing Information Through a Global Workspace

Hugo Chateau-Laurent, Rufin VanRullen

TL;DR

This work introduces a Global Workspace Theory–inspired model that routes information among specialized modules via a gating controller to perform sequential, System-2–like reasoning. The approach enables chaining of operations for arithmetic addition, demonstrated in both a hand-designed one-hot digit setup and a learned MNIST-based setup, with a central global workspace updating through gated module interactions. Empirically, the Global Workspace architecture achieves robust generalization to unseen additions and outperforms LSTMs and Transformer baselines on interpolated and extrapolated tasks, despite having fewer parameters. The findings suggest that workspace-based, multi-module architectures can enhance deep learning's reasoning and cross-modal capabilities, with potential extensions to more complex, multimodal tasks and unconscious-vs-conscious processing analogues.

Abstract

We present a model inspired by the Global Workspace Theory that integrates specialized modules to perform a sequential reasoning task. A controller selectively routes information between modules through the workspace using a gating mechanism. This approach allows the model to chain operations by iteratively broadcasting information between specialized domains, mimicking System-2 reasoning. We evaluate the model's performance on a simple addition task, where two addends must be summed. The task can be solved by routing information sequentially through an Input module, an Increment module (multiple times), and finally an Output module. We consider two implementations of this system with increasing complexity. First, using hand-designed modules operating on one-hot digit representations, the controller (a LSTM recurrent network) learns to select the appropriate modules (input, increment, output) in the appropriate sequence. Second, we replace the hand-designed modules with learned representation modules for MNIST images and an increment module trained on the task objectives; here again, the controller learns the appropriate sequential module selection to solve the task. Finally, we show that the Global Workspace model, while having fewer parameters, outperforms LSTMs and Transformers when tested on unseen addition operations (both interpolations and extrapolations of addition operations seen during training). Our results highlight the potential of architectures inspired by the Global Workspace Theory to enhance deep learning's reasoning capabilities.

Learning to Chain Operations by Routing Information Through a Global Workspace

TL;DR

This work introduces a Global Workspace Theory–inspired model that routes information among specialized modules via a gating controller to perform sequential, System-2–like reasoning. The approach enables chaining of operations for arithmetic addition, demonstrated in both a hand-designed one-hot digit setup and a learned MNIST-based setup, with a central global workspace updating through gated module interactions. Empirically, the Global Workspace architecture achieves robust generalization to unseen additions and outperforms LSTMs and Transformer baselines on interpolated and extrapolated tasks, despite having fewer parameters. The findings suggest that workspace-based, multi-module architectures can enhance deep learning's reasoning and cross-modal capabilities, with potential extensions to more complex, multimodal tasks and unconscious-vs-conscious processing analogues.

Abstract

We present a model inspired by the Global Workspace Theory that integrates specialized modules to perform a sequential reasoning task. A controller selectively routes information between modules through the workspace using a gating mechanism. This approach allows the model to chain operations by iteratively broadcasting information between specialized domains, mimicking System-2 reasoning. We evaluate the model's performance on a simple addition task, where two addends must be summed. The task can be solved by routing information sequentially through an Input module, an Increment module (multiple times), and finally an Output module. We consider two implementations of this system with increasing complexity. First, using hand-designed modules operating on one-hot digit representations, the controller (a LSTM recurrent network) learns to select the appropriate modules (input, increment, output) in the appropriate sequence. Second, we replace the hand-designed modules with learned representation modules for MNIST images and an increment module trained on the task objectives; here again, the controller learns the appropriate sequential module selection to solve the task. Finally, we show that the Global Workspace model, while having fewer parameters, outperforms LSTMs and Transformers when tested on unseen addition operations (both interpolations and extrapolations of addition operations seen during training). Our results highlight the potential of architectures inspired by the Global Workspace Theory to enhance deep learning's reasoning capabilities.

Paper Structure

This paper contains 11 sections, 4 equations, 14 figures, 2 tables.

Figures (14)

  • Figure 1: Example input, output, and target. The input is the left addend (three in the example) and the one-hot encoding of the right addend (two in the example; in our second, MNIST model, this one-hot input is replaced by an image input from the MNIST dataset). The right addend is only shown in the first time step. The model outputs predicted sums (again, as a one-hot vector), which are compared to the target.
  • Figure 2: Architecture of the Global Workspace model. In the one-hot model, encoders and decoders are set to the Identity, while the Operator/Incrementer module's weight matrix is set manually. In the MNIST model, input images $A_R(t)$ are encoded via a pretrained VAE $\mathcal{E}$; Global Workspace encoders and decoders are trained for multimodal representation objectives, and the Operator/Incrementer is trained from scratch to update the current GW representation given a task instruction $A_L=1..9$ and a ground-truth ("oracle") routing sequence. For both models, the router (in red) is finally trained to produce a sequence of "gates" for the three modules. Given a task instruction $A_L=n$, the expected gating sequence (after learning) is to first select the input module to read out a digit in the GW; then call the Operator module $n$ times in a row to perform the addition and update the GW, before selecting the output module to return the GW representation as a one-hot digit answer.
  • Figure 3: Comparison between the hand-designed incrementer of the one-hot model (top) and the behavior of the MNIST model with $A_L=1$ (bottom). The bottom panel represents the softmaxed output (activity of the digit domain) at $t=2$ (i.e. after three routing steps). The operators of both models shift the digit by one in their respective representation.
  • Figure 4: Example behavior of the Global Workspace model adding $A_L=9$ to $A_R=0$. The top panel displays the three output gates of the Router over the simulation time steps. The next panel shows the one-hot encoded input (only available at time step 0). The third panel is the state of the global workspace (one-hot encoded). The last panel reflects the Output module (one-hot encoding). The Router appears to sequentially select the appropriate modules: first the Input, then the Increment module for 9 successive steps (allowing the global workspace representation to increase), and finally the Output producing the correct answer.
  • Figure 5: Architecture of the multimodal global workspace. The MNIST images are encoded using $\mathcal{E}$. Furthermore, both the vision and digit domains can send information to the workspace through their encoder $e$, and read from it using their decoder $d$.
  • ...and 9 more figures