Relational reasoning and inductive bias in transformers and large language models
Jesse Geerts, Andrew Liu, Stephanie Chan, Claudia Clopath, Kimberly Stachenfeld
TL;DR
Relational reasoning in transformers hinges on whether learning stores relational structure in weights (IWL) or must be inferred from context (ICL). IWL yields robust transitive inference with distance- and terminal-item effects, while standard ICL fails to generalize TI and instead relies on induction circuits for pattern matching. Pre-training transformers on in-context linear regression induces TI during subsequent ICL without forming induction circuits, suggesting distributed relational representations can support transitive generalization. Extending to large language models, linear geometry prompts enhance TI whereas circular prompts disrupt it when reliance on stored knowledge is limited, indicating that geometry-compatible representations underpin TI across model scales. Overall, both training regime and representational geometry critically determine transformers' TI capacity and generalization to LLMs.
Abstract
Transformer-based models have demonstrated remarkable reasoning abilities, but the mechanisms underlying relational reasoning remain poorly understood. We investigate how transformers perform \textit{transitive inference}, a classic relational reasoning task which requires inference indirectly related items (e.g., if $A>B$ and $B>C$, then $A>C$), comparing in-weights learning (IWL) and in-context learning (ICL) strategies. We find that IWL naturally induces a generalization bias towards transitive inference despite training only on adjacent items, whereas ICL models develop induction circuits implementing match-and-copy strategies that fail to encode hierarchical relationships. However, when pre-trained on in-context linear regression tasks, transformers successfully exhibit in-context generalizable transitive inference, displaying both \textit{symbolic distance} and \textit{terminal item effects} characteristic of human and animal performance, without forming induction circuits. We extend these findings to large language models, demonstrating that prompting with linear geometric scaffolds improves transitive inference, while circular geometries (which violate transitivity by allowing wraparound) impair performance, particularly when models cannot rely on stored knowledge. Together, these results reveal that both the training regime and the geometric structure of induced representations critically determine transformers' capacity for transitive inference.
