Autoregressive Large Language Models are Computationally Universal
Dale Schuurmans, Hanjun Dai, Francesco Zanini
TL;DR
This work demonstrates that an unaided large language model can perform universal computation by leveraging extended autoregressive decoding. It introduces Lag systems as a bridge between language-model decoding and classical computation, constructs a universal Lag system from the small universal TM U_{15,2}, and shows that gemini-1.5-pro-001 can deterministically execute all 2027 production rules via a single system prompt, under greedy decoding. This establishes, under the Church-Turing thesis, that prompting a capable LLM with extended decoding yields a general-purpose computer, with implications for program synthesis and non-formal problem-solving. The work also clarifies the limits of standard generalized autoregressive decoding, which only reaches linear bounded automata, and highlights a path to verifying universality with a compact rule set.
Abstract
We show that autoregressive decoding of a transformer-based language model can realize universal computation, without external intervention or modification of the model's weights. Establishing this result requires understanding how a language model can process arbitrarily long inputs using a bounded context. For this purpose, we consider a generalization of autoregressive decoding where, given a long input, emitted tokens are appended to the end of the sequence as the context window advances. We first show that the resulting system corresponds to a classical model of computation, a Lag system, that has long been known to be computationally universal. By leveraging a new proof, we show that a universal Turing machine can be simulated by a Lag system with 2027 production rules. We then investigate whether an existing large language model can simulate the behaviour of such a universal Lag system. We give an affirmative answer by showing that a single system-prompt can be developed for gemini-1.5-pro-001 that drives the model, under deterministic (greedy) decoding, to correctly apply each of the 2027 production rules. We conclude that, by the Church-Turing thesis, prompted gemini-1.5-pro-001 with extended autoregressive (greedy) decoding is a general purpose computer.
