Tighter Bounds on the Expressivity of Transformer Encoders
David Chiang, Peter Cholak, Anand Pillay
TL;DR
The paper tightly bounds transformer expressivity by linking transformer encoders to a precise logical framework. It defines the logic \\\\mathsf{FOC}[+;\\mathsf{MOD}] and proves it is simultaneously an upper bound for fixed-precision transformer encoders and a lower bound for transformer encoders, bringing the characterization closer to an exact description of languages transformers can recognize. The results hinge on a robust normal-form for the logic and constructive translations between transformers and logical formulas, with explicit constructions and complexity analyses. By showing a strict relationship to uniform \\\\mathsf{TC}^0\\ and SSCMs, the work clarifies the computational boundaries of transformer architectures and lays groundwork for future complete characterizations and architecture variants.
Abstract
Characterizing neural networks in terms of better-understood formal systems has the potential to yield new insights into the power and limitations of these networks. Doing so for transformers remains an active area of research. Bhattamishra and others have shown that transformer encoders are at least as expressive as a certain kind of counter machine, while Merrill and Sabharwal have shown that fixed-precision transformer encoders recognize only languages in uniform $TC^0$. We connect and strengthen these results by identifying a variant of first-order logic with counting quantifiers that is simultaneously an upper bound for fixed-precision transformer encoders and a lower bound for transformer encoders. This brings us much closer than before to an exact characterization of the languages that transformer encoders recognize.
