Autoregressive Large Language Models are Computationally Universal

Dale Schuurmans; Hanjun Dai; Francesco Zanini

Autoregressive Large Language Models are Computationally Universal

Dale Schuurmans, Hanjun Dai, Francesco Zanini

TL;DR

This work demonstrates that an unaided large language model can perform universal computation by leveraging extended autoregressive decoding. It introduces Lag systems as a bridge between language-model decoding and classical computation, constructs a universal Lag system from the small universal TM U_{15,2}, and shows that gemini-1.5-pro-001 can deterministically execute all 2027 production rules via a single system prompt, under greedy decoding. This establishes, under the Church-Turing thesis, that prompting a capable LLM with extended decoding yields a general-purpose computer, with implications for program synthesis and non-formal problem-solving. The work also clarifies the limits of standard generalized autoregressive decoding, which only reaches linear bounded automata, and highlights a path to verifying universality with a compact rule set.

Abstract

We show that autoregressive decoding of a transformer-based language model can realize universal computation, without external intervention or modification of the model's weights. Establishing this result requires understanding how a language model can process arbitrarily long inputs using a bounded context. For this purpose, we consider a generalization of autoregressive decoding where, given a long input, emitted tokens are appended to the end of the sequence as the context window advances. We first show that the resulting system corresponds to a classical model of computation, a Lag system, that has long been known to be computationally universal. By leveraging a new proof, we show that a universal Turing machine can be simulated by a Lag system with 2027 production rules. We then investigate whether an existing large language model can simulate the behaviour of such a universal Lag system. We give an affirmative answer by showing that a single system-prompt can be developed for gemini-1.5-pro-001 that drives the model, under deterministic (greedy) decoding, to correctly apply each of the 2027 production rules. We conclude that, by the Church-Turing thesis, prompted gemini-1.5-pro-001 with extended autoregressive (greedy) decoding is a general purpose computer.

Autoregressive Large Language Models are Computationally Universal

TL;DR

Abstract

Paper Structure (12 sections, 13 theorems, 17 equations, 3 figures, 1 table, 4 algorithms)

This paper contains 12 sections, 13 theorems, 17 equations, 3 figures, 1 table, 4 algorithms.

Introduction
Autoregressive decoding
Lag systems
Memory access control with a Lag system
Counterclockwise position control
Clockwise position control
Turing machines
Simulating a Turing machine with a Lag system
A universal Lag system
Simulating a universal Lag system with a language model
Related work
Conclusion

Key Result

Proposition 1

For any deterministic language model $M$ with context length $N$, there exists a deterministic Lag system $L$ such that, for any input string $s$ with $|s|\geq N$, the execution of Algorithm alg:lag with $L$ simulates extended autoregressive decoding of $M$ with Algorithm alg:auto, in the sense that

Figures (3)

Figure 1: Generalized autoregressive decoding when the length of the input sequence $n$ exceeds the context length $N$. Here $k$ is the number of output symbols that have already been appended. The figure depicts the generation of the $(k+1)$st output symbol conditioning on the context of length $N$ starting at index $k+1$, followed by the generation of the $(k+2)$nd output symbol conditioning on the context of length $N$ starting at index $k+2$. This process reduces to standard $N$-Markov autoregressive decoding when $n\leq N$.
Figure 2: Extended autoregressive decoding when two output symbols are generated for a given context. Here $\ell$ is the number of output symbols that have already been appended after $k$ iterations, with $\ell\geq k$, and it is assumed the length of the input sequence $n$ exceeds the context length $N$.
Figure 3: Prompting strategy for simulating a universal Lag system. The $systemprompt$ consists of copies of the entire rule set, which is prepended to the prompt used in each call to the language model. A sliding context window of size $2$ is moved through the symbol sequence, emitting $1$ or $2$ symbols that are appended to the end of the sequence in each iteration. (Note that each symbol is a pair of tokens.) The sliding context window advances $1$ position per iteration. For example, in the first row, the prompt given to the language model on the initial iteration is $systemprompt\: s_1s_2$; for the next iteration, the prompt given to the language model is $systemprompt\: s_2s_3$, then $systemprompt\: s_3s_4$ in the subsequent iteration, and so on.

Theorems & Definitions (13)

Proposition 1
Proposition 2
Proposition 3
Proposition 4
Proposition 5
Lemma 6
Theorem 7
Corollary 8
Corollary 9
Corollary 10
...and 3 more

Autoregressive Large Language Models are Computationally Universal

TL;DR

Abstract

Autoregressive Large Language Models are Computationally Universal

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (13)