Table of Contents
Fetching ...

Probability Distributions Computed by Hard-Attention Transformers

Andy Yang, Anej Svete, Jiaoda Li, Anthony Widjaja Lin, Jonathan Rawski, Ryan Cotterell, David Chiang

TL;DR

This work characterizes the probability distributions that transformer language models can express when used autoregressively, contrasting real-weighted and Boolean (unweighted) settings and distinguishing classifiers from autoregressors. By formalizing Unique Hard Attention Transformers (UHATs) and mapping them to finite automata and temporal-logic formalisms, the authors derive precise expressivity results: Boolean LTL classifiers and autoregressors are equivalent, while real-valued settings reveal separations such as LTL classifiers yielding only aperiodic step functions and autoregressors aligning with counter-free DFAs but not fully matching weighted NFAs. The paper also shows that autoregression can increase expressive power relative to classifiers in several fragments of LTL and in counting-based temporal logics, highlighting where established Boolean equivalences fail in practice. Overall, the results provide a cohesive framework for understanding transformer language models as probabilistic generators, clarifying the limits and possibilities of their expressivity for real-world language modeling tasks.

Abstract

Most expressivity results for transformers treat them as language recognizers (which accept or reject strings), and not as they are used in practice, as language models (which generate strings autoregressively and probabilistically). Here, we characterize the probability distributions that transformer language models can express. We show that making transformer language recognizers autoregressive can sometimes increase their expressivity, and that making them probabilistic can break equivalences that hold in the non-probabilistic case. Our overall contribution is to tease apart what functions transformers are capable of expressing, in their most common use-case as language models.

Probability Distributions Computed by Hard-Attention Transformers

TL;DR

This work characterizes the probability distributions that transformer language models can express when used autoregressively, contrasting real-weighted and Boolean (unweighted) settings and distinguishing classifiers from autoregressors. By formalizing Unique Hard Attention Transformers (UHATs) and mapping them to finite automata and temporal-logic formalisms, the authors derive precise expressivity results: Boolean LTL classifiers and autoregressors are equivalent, while real-valued settings reveal separations such as LTL classifiers yielding only aperiodic step functions and autoregressors aligning with counter-free DFAs but not fully matching weighted NFAs. The paper also shows that autoregression can increase expressive power relative to classifiers in several fragments of LTL and in counting-based temporal logics, highlighting where established Boolean equivalences fail in practice. Overall, the results provide a cohesive framework for understanding transformer language models as probabilistic generators, clarifying the limits and possibilities of their expressivity for real-world language modeling tasks.

Abstract

Most expressivity results for transformers treat them as language recognizers (which accept or reject strings), and not as they are used in practice, as language models (which generate strings autoregressively and probabilistically). Here, we characterize the probability distributions that transformer language models can express. We show that making transformer language recognizers autoregressive can sometimes increase their expressivity, and that making them probabilistic can break equivalences that hold in the non-probabilistic case. Our overall contribution is to tease apart what functions transformers are capable of expressing, in their most common use-case as language models.

Paper Structure

This paper contains 28 sections, 17 theorems, 44 equations, 2 figures.

Key Result

Theorem 6.1

UHATs, ${\mathsf{LTL}}$, and cfDFAs define equivalent state encoders.

Figures (2)

  • Figure 1: In the Boolean semiring, equivalences from the literature yang-etal-2024-maskedjerad-etal-2025-uniqueyang-etal-2025-knee carry over from classifiers to autoregressors; however, sometimes autoregressors are more expressive than classifiers. In the real semiring, $\mathsf{LTL}$ and counter-free DFAs and NFAs become less expressive than counter-free NFAs, and rightmost UHATs are only as expressive as the former. Key: strict inclusion, equivalence. $\neq$ incomparable.
  • Figure 2: (a) A DFA that is counter-free (with $k=2$). (b) A DFA that is not counter-free, because for all $k$, the strings $a^k$ and $a^{k+1}$ have opposite actions. (c) A counter-free weighted NFA that has no equivalent weighted DFA (\ref{['thm:enter-label']}).

Theorems & Definitions (50)

  • Definition 4.1
  • Definition 4.2
  • Definition 5.1: Deterministic finite automaton
  • Definition 5.2: Counter-free automaton
  • Definition 5.3: Linear temporal logic
  • Theorem 6.1
  • proof
  • Corollary 6.2
  • proof
  • Definition 6.1
  • ...and 40 more