Table of Contents
Fetching ...

A Neuroscience-Inspired Dual-Process Model of Compositional Generalization

Alex Noviello, Claas Beger, Jacob Groner, Kevin Ellis, Weinan Sun

TL;DR

This work tackles the problem of systematic compositional generalization in neural networks. It introduces Mirage, a dual-process architecture that combines a fast, meta-trained Neural Decomposer (System 1) with an explicit Schema Engine (System 2) to extract and apply prioritized schemas in an iterative refinement loop. On SCAN, Mirage achieves $>$99\% accuracy across all splits, significantly surpassing monolithic transformers and highlighting the importance of separating procedural decomposition from declarative knowledge. The study provides a concrete, interpretable cognitive processing account for how compositional reasoning can arise from modular architectures with potential for broader applicability in flexible reasoning tasks.

Abstract

Deep learning models struggle with systematic compositional generalization, a hallmark of human cognition. We propose \textsc{Mirage}, a neuro-inspired dual-process model that offers a processing account for this ability. It combines a fast, intuitive ``System~1'' (a meta-trained Transformer) with a deliberate, rule-based ``System~2'' (a Schema Engine), mirroring the brain's neocortical and hippocampal--prefrontal circuits. Trained to perform general, single-step decomposition on a stream of random grammars, Mirage achieves $>$99\% accuracy on all splits of the SCAN benchmark in a task-agnostic setting. Ablations confirm that the model's systematic behavior emerges from the architectural interplay of its two systems, particularly its use of explicit, prioritized schemas and iterative refinement. In line with recent progress on recursive/recurrent Transformer approaches, Mirage preserves an iterative neural update while externalizing declarative control into an interpretable schema module. Our work provides a concrete computational model for interpreting how compositional reasoning can arise from a modular cognitive architecture.

A Neuroscience-Inspired Dual-Process Model of Compositional Generalization

TL;DR

This work tackles the problem of systematic compositional generalization in neural networks. It introduces Mirage, a dual-process architecture that combines a fast, meta-trained Neural Decomposer (System 1) with an explicit Schema Engine (System 2) to extract and apply prioritized schemas in an iterative refinement loop. On SCAN, Mirage achieves 99\% accuracy across all splits, significantly surpassing monolithic transformers and highlighting the importance of separating procedural decomposition from declarative knowledge. The study provides a concrete, interpretable cognitive processing account for how compositional reasoning can arise from modular architectures with potential for broader applicability in flexible reasoning tasks.

Abstract

Deep learning models struggle with systematic compositional generalization, a hallmark of human cognition. We propose \textsc{Mirage}, a neuro-inspired dual-process model that offers a processing account for this ability. It combines a fast, intuitive ``System~1'' (a meta-trained Transformer) with a deliberate, rule-based ``System~2'' (a Schema Engine), mirroring the brain's neocortical and hippocampal--prefrontal circuits. Trained to perform general, single-step decomposition on a stream of random grammars, Mirage achieves 99\% accuracy on all splits of the SCAN benchmark in a task-agnostic setting. Ablations confirm that the model's systematic behavior emerges from the architectural interplay of its two systems, particularly its use of explicit, prioritized schemas and iterative refinement. In line with recent progress on recursive/recurrent Transformer approaches, Mirage preserves an iterative neural update while externalizing declarative control into an interpretable schema module. Our work provides a concrete computational model for interpreting how compositional reasoning can arise from a modular cognitive architecture.

Paper Structure

This paper contains 26 sections, 3 figures, 1 table, 3 algorithms.

Figures (3)

  • Figure 1: Mirage architecture and inference loop. (A) System 1 (Neural Decomposer) is a meta-trained Transformer for pattern matching. System 2 (Schema Engine) extracts and manages a library of prioritized schemas, modeling HPC--PFC function. (B) During inference, the systems iterate. For a command like "walk right twice after turn left," System 2 provides the relevant grammar. System 1 applies one decomposition step per pass, with outputs stored in an episodic memory. This process repeats, reducing the composition tree layer by layer until a primitive action sequence is produced.
  • Figure 2: We apply a Clone-Structured Causal Graph with 100 clones directly on a concatenated subset of SCAN sequences and visualize the resulting model as a simple directed graph.
  • Figure 3: Overview of extracted schemas by the CSCG extractor, visualized as directed sequence graphs. Note that turn is a special case, which evokes different behavior when combined with around or opposite.