On the Notion that Language Models Reason
Bertram Højer
TL;DR
This paper argues that language models do not perform genuine logical reasoning but operate as implicit finite-order Markov kernels $\kappa_{\theta}$ mapping contexts to token distributions. It reframes reasoning as a form of inference manifested through statistical regularities and approximate invariances in the kernel, rather than as explicit logical computation, and introduces metrics for transformation and inferential invariances using parameters $\epsilon_T$ and $\delta_r$. The authors propose a research program using synthetic datasets and toy transformers to study epistemic uncertainty and the conditions under which reasoning-like outputs emerge, while emphasizing the need for precise terminology. By treating reasoning as a subset of inference, the work aims to clarify the capabilities and limitations of LMs and guide future investigations into logical inference and epistemic properties within neural sequence models.
Abstract
Language models (LMs) are said to be exhibiting reasoning, but what does this entail? We assess definitions of reasoning and how key papers in the field of natural language processing (NLP) use the notion and argue that the definitions provided are not consistent with how LMs are trained, process information, and generate new tokens. To illustrate this incommensurability we assume the view that transformer-based LMs implement an \textit{implicit} finite-order Markov kernel mapping contexts to conditional token distributions. In this view, reasoning-like outputs correspond to statistical regularities and approximate statistical invariances in the learned kernel rather than the implementation of explicit logical mechanisms. This view is illustrative of the claim that LMs are "statistical pattern matchers"" and not genuine reasoners and provides a perspective that clarifies why reasoning-like outputs arise in LMs without any guarantees of logical consistency. This distinction is fundamental to how epistemic uncertainty is evaluated in LMs. We invite a discussion on the importance of how the computational processes of the systems we build and analyze in NLP research are described.
