Probability Distributions Computed by Hard-Attention Transformers
Andy Yang, Anej Svete, Jiaoda Li, Anthony Widjaja Lin, Jonathan Rawski, Ryan Cotterell, David Chiang
TL;DR
This work characterizes the probability distributions that transformer language models can express when used autoregressively, contrasting real-weighted and Boolean (unweighted) settings and distinguishing classifiers from autoregressors. By formalizing Unique Hard Attention Transformers (UHATs) and mapping them to finite automata and temporal-logic formalisms, the authors derive precise expressivity results: Boolean LTL classifiers and autoregressors are equivalent, while real-valued settings reveal separations such as LTL classifiers yielding only aperiodic step functions and autoregressors aligning with counter-free DFAs but not fully matching weighted NFAs. The paper also shows that autoregression can increase expressive power relative to classifiers in several fragments of LTL and in counting-based temporal logics, highlighting where established Boolean equivalences fail in practice. Overall, the results provide a cohesive framework for understanding transformer language models as probabilistic generators, clarifying the limits and possibilities of their expressivity for real-world language modeling tasks.
Abstract
Most expressivity results for transformers treat them as language recognizers (which accept or reject strings), and not as they are used in practice, as language models (which generate strings autoregressively and probabilistically). Here, we characterize the probability distributions that transformer language models can express. We show that making transformer language recognizers autoregressive can sometimes increase their expressivity, and that making them probabilistic can break equivalences that hold in the non-probabilistic case. Our overall contribution is to tease apart what functions transformers are capable of expressing, in their most common use-case as language models.
