Table of Contents
Fetching ...

Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization

Badr AlKhamissi, C. Nicolò De Sabbata, Greta Tuckute, Zeming Chen, Martin Schrimpf, Antoine Bosselut

TL;DR

MiCRo introduces a brain-inspired modular transformer that partitions every layer into four expert modules corresponding to language, logic, social reasoning, and world knowledge. A three-stage curriculum induces specialization, with an initial expert-focused pretraining, router calibration, and end-to-end instruction finetuning, enabling both interpretability and inference-time steering. Empirical results show interpretable routing patterns, causal ablations confirming functional contributions, and strong alignment with human behavior on CogBench, alongside competitive performance on reasoning benchmarks. Overall, MiCRo demonstrates that cognitively grounded modularity yields more transparent, steerable, and human-aligned language models without sacrificing performance, suggesting a scalable path toward brain-aligned AI systems.

Abstract

Human cognitive behavior arises from the interaction of specialized brain networks dedicated to distinct functions, such as language, logic, and social reasoning. Inspired by this organization, we propose Mixture of Cognitive Reasoners (MiCRo): a modular, transformer-based architecture post-trained with a curriculum that induces functional specialization across experts. Concretely, we partition the layers of a pretrained language model into four expert modules aligned with well-studied cognitive networks in the human brain. MiCRo offers three key advantages over standard language models. (1) The specialized experts are interpretable and causally meaningful -- ablating a module causes substantial drops on benchmarks requiring its specialized domain. (2) MiCRo's behavior can be dynamically steered at inference time by routing tokens to particular experts (e.g., favoring social over logical reasoning), enabling fine-grained control over outputs. (3) MiCRo outperforms or matches comparable baselines on both machine-learning reasoning benchmarks (e.g., GSM8K, BBH) and alignment to human behavior (CogBench), while maintaining interpretability. Taken together, cognitively grounded functional specialization yields models that are both more human-like and more human-interpretable.

Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization

TL;DR

MiCRo introduces a brain-inspired modular transformer that partitions every layer into four expert modules corresponding to language, logic, social reasoning, and world knowledge. A three-stage curriculum induces specialization, with an initial expert-focused pretraining, router calibration, and end-to-end instruction finetuning, enabling both interpretability and inference-time steering. Empirical results show interpretable routing patterns, causal ablations confirming functional contributions, and strong alignment with human behavior on CogBench, alongside competitive performance on reasoning benchmarks. Overall, MiCRo demonstrates that cognitively grounded modularity yields more transparent, steerable, and human-aligned language models without sacrificing performance, suggesting a scalable path toward brain-aligned AI systems.

Abstract

Human cognitive behavior arises from the interaction of specialized brain networks dedicated to distinct functions, such as language, logic, and social reasoning. Inspired by this organization, we propose Mixture of Cognitive Reasoners (MiCRo): a modular, transformer-based architecture post-trained with a curriculum that induces functional specialization across experts. Concretely, we partition the layers of a pretrained language model into four expert modules aligned with well-studied cognitive networks in the human brain. MiCRo offers three key advantages over standard language models. (1) The specialized experts are interpretable and causally meaningful -- ablating a module causes substantial drops on benchmarks requiring its specialized domain. (2) MiCRo's behavior can be dynamically steered at inference time by routing tokens to particular experts (e.g., favoring social over logical reasoning), enabling fine-grained control over outputs. (3) MiCRo outperforms or matches comparable baselines on both machine-learning reasoning benchmarks (e.g., GSM8K, BBH) and alignment to human behavior (CogBench), while maintaining interpretability. Taken together, cognitively grounded functional specialization yields models that are both more human-like and more human-interpretable.

Paper Structure

This paper contains 51 sections, 23 figures, 6 tables.

Figures (23)

  • Figure 1: Brain-Inspired Modular Language Model.(a) Illustration of major cognitive networks in the human brain. (b) Our proposed Mixture of Cognitive Reasoners (MiCRo) architecture. The MiCRo architecture partitions each transformer block into four expert modules corresponding to analogous brain networks; a router assigns each token to an expert at every layer (i.e., assignments can vary across layers and tokens). (c) Illustration for causal steering via mechanistic ablations: removing a module shifts behavior and degrades domain-relevant performance. (d) Token-level routing on a sample prompt shows semantically coherent expert usage.
  • Figure 2: Training Curriculum for Inducing Specialization. Our brain-inspired Mixture of Cognitive Reasoners (MiCRo) model contains four experts per layer, each aligned with a distinct cognitive network in the brain. In Stage-I, we train only the experts using a small, curated dataset MiCRo$_\text{SFT}$ (see example on the left), providing each expert with an initial inductive bias. In Stage-II, we freeze the whole model and train the router on the same dataset to learn expert selection. In Stage-III, we finetune the entire model end-to-end on a large-scale instruction tuning dataset.
  • Figure 3: Semantically Meaningful Routing Across Experts. Token routing patterns in MiCRo-Llama-1B. Each bar indicates the proportion of tokens routed to a given expert across layers, with variance shown across sentences (n=50). The model exhibits clear domain-specific specialization consistent with the intended brain-inspired organization. For example, social cognition samples are routed to the social expert, while arithmetic tasks are routed to the logic expert. We find that the language expert is consistently activated in the early layers (see Appendix \ref{['app:token-routing-patterns']} for layer-wise routing plots and results from additional models). Two random samples are shown below each subplot.
  • Figure 4: Expert Ablations Reveal the Causal Contributions of Specialized modules. Top and bottom panels show results for MiCRo-Llama-1B and MiCRo-Llama-3B. Removing the Logic expert causes large drops on MATH and GSM8K, while removing the Social expert yields slight gains. For MMLU and BBH, results indicate that some group of subtasks rely on distinct experts, whereas others draw on overlapping contributions. Additional models in Appendix \ref{['app:expert-ablations']}.
  • Figure 5: Neuroscience Localizers Recover Functionally Specialized Experts. (a) MiCRo-Llama-1B and (b) MiCRo-Llama-3B. For each model, we apply three neuroscience-inspired localizers—Language, Multiple Demand (MD) and Theory of Mind (ToM)—to examine the selectivity of localized units across experts and layers. Each plot shows the percentage of units in each expert of each layer that belongs to the top-10% selective units in the whole model.
  • ...and 18 more figures