Table of Contents
Fetching ...

Relational reasoning and inductive bias in transformers and large language models

Jesse Geerts, Andrew Liu, Stephanie Chan, Claudia Clopath, Kimberly Stachenfeld

TL;DR

Relational reasoning in transformers hinges on whether learning stores relational structure in weights (IWL) or must be inferred from context (ICL). IWL yields robust transitive inference with distance- and terminal-item effects, while standard ICL fails to generalize TI and instead relies on induction circuits for pattern matching. Pre-training transformers on in-context linear regression induces TI during subsequent ICL without forming induction circuits, suggesting distributed relational representations can support transitive generalization. Extending to large language models, linear geometry prompts enhance TI whereas circular prompts disrupt it when reliance on stored knowledge is limited, indicating that geometry-compatible representations underpin TI across model scales. Overall, both training regime and representational geometry critically determine transformers' TI capacity and generalization to LLMs.

Abstract

Transformer-based models have demonstrated remarkable reasoning abilities, but the mechanisms underlying relational reasoning remain poorly understood. We investigate how transformers perform \textit{transitive inference}, a classic relational reasoning task which requires inference indirectly related items (e.g., if $A>B$ and $B>C$, then $A>C$), comparing in-weights learning (IWL) and in-context learning (ICL) strategies. We find that IWL naturally induces a generalization bias towards transitive inference despite training only on adjacent items, whereas ICL models develop induction circuits implementing match-and-copy strategies that fail to encode hierarchical relationships. However, when pre-trained on in-context linear regression tasks, transformers successfully exhibit in-context generalizable transitive inference, displaying both \textit{symbolic distance} and \textit{terminal item effects} characteristic of human and animal performance, without forming induction circuits. We extend these findings to large language models, demonstrating that prompting with linear geometric scaffolds improves transitive inference, while circular geometries (which violate transitivity by allowing wraparound) impair performance, particularly when models cannot rely on stored knowledge. Together, these results reveal that both the training regime and the geometric structure of induced representations critically determine transformers' capacity for transitive inference.

Relational reasoning and inductive bias in transformers and large language models

TL;DR

Relational reasoning in transformers hinges on whether learning stores relational structure in weights (IWL) or must be inferred from context (ICL). IWL yields robust transitive inference with distance- and terminal-item effects, while standard ICL fails to generalize TI and instead relies on induction circuits for pattern matching. Pre-training transformers on in-context linear regression induces TI during subsequent ICL without forming induction circuits, suggesting distributed relational representations can support transitive generalization. Extending to large language models, linear geometry prompts enhance TI whereas circular prompts disrupt it when reliance on stored knowledge is limited, indicating that geometry-compatible representations underpin TI across model scales. Overall, both training regime and representational geometry critically determine transformers' TI capacity and generalization to LLMs.

Abstract

Transformer-based models have demonstrated remarkable reasoning abilities, but the mechanisms underlying relational reasoning remain poorly understood. We investigate how transformers perform \textit{transitive inference}, a classic relational reasoning task which requires inference indirectly related items (e.g., if and , then ), comparing in-weights learning (IWL) and in-context learning (ICL) strategies. We find that IWL naturally induces a generalization bias towards transitive inference despite training only on adjacent items, whereas ICL models develop induction circuits implementing match-and-copy strategies that fail to encode hierarchical relationships. However, when pre-trained on in-context linear regression tasks, transformers successfully exhibit in-context generalizable transitive inference, displaying both \textit{symbolic distance} and \textit{terminal item effects} characteristic of human and animal performance, without forming induction circuits. We extend these findings to large language models, demonstrating that prompting with linear geometric scaffolds improves transitive inference, while circular geometries (which violate transitivity by allowing wraparound) impair performance, particularly when models cannot rely on stored knowledge. Together, these results reveal that both the training regime and the geometric structure of induced representations critically determine transformers' capacity for transitive inference.

Paper Structure

This paper contains 35 sections, 11 figures.

Figures (11)

  • Figure 1: (A) Transitive inference setup with Omniglot images. First row shows an example hierarchy. Second row shows example evaluation of non-adjacent pair. (B) Transitive inference as a sequence. During training, the model is presented with a sequence defining the "hierarchy" (which items are larger than which), followed by a "query" consisting of an adjacent pair of items. The model is trained to categorize the order of the query pair: +1 if the first item is larger than the second, -1 if it is smaller. (C) Model architecture: two-layer attention-only transformer. (D) Illustration of the training set (adjacent pairs) and test set (non-adjacent pairs). Color indicates whether the first item is larger (+1) or smaller (-1). (E) Example accuracy on all training and test pairs (data from rhesus macaques). Figure reprinted from lippl_mathematical_2024, data from jensen_implicit_2015.
  • Figure 2: In-weights learning experiments. (A) Training and evaluation setup: the hierarchy is fixed across all sequences, and the context examples are randomly drawn and irrelevant for query prediction. (B) Training loss. (C) Training accuracy. (D) Final model accuracy for each pairwise comparison, sorted by symbolic distance. (E) Principal component analysis of the final hidden layer activations. Colors show signed symbolic distance from -6 (GA, red) to +6 (AG, blue).
  • Figure 3: In-context learning experiments. (A) Training and evaluation setup. (B) Training loss. (C) Training accuracy. (D) Final model accuracy for each pairwise comparison, sorted by symbolic distance. Dashed gray line shows chance level. (E) Principal component analysis of the final hidden layer activations. Colors show signed symbolic distance from -6 (GA, red) to +6 (AG, blue).
  • Figure 4: (A) Layer 1 attention pattern during the in-context TI task. (B) Induction strength of each layer 2 head during evaluation of the TI task with adjacent queries. (C) Induction strength versus accuracy after ablating head $i$. (D) Induction strength versus accuracy after ablating all but head 0 and head $i$.
  • Figure 5: (A) Pre-training setup for in-context linear regression (evaluation setup was same as in Figure \ref{['fig:fig3']}). (B) Post training model accuracy for each pairwise comparison, sorted by symbolic distance. (C) Induction strength of each head throughout training. (D) Principal component analysis of the final hidden layer activations. Colors show signed symbolic distance from -6 (GA, red) to +6 (AG, blue).
  • ...and 6 more figures