Tighter Bounds on the Expressivity of Transformer Encoders

David Chiang; Peter Cholak; Anand Pillay

Tighter Bounds on the Expressivity of Transformer Encoders

David Chiang, Peter Cholak, Anand Pillay

TL;DR

The paper tightly bounds transformer expressivity by linking transformer encoders to a precise logical framework. It defines the logic \\\\mathsf{FOC}[+;\\mathsf{MOD}] and proves it is simultaneously an upper bound for fixed-precision transformer encoders and a lower bound for transformer encoders, bringing the characterization closer to an exact description of languages transformers can recognize. The results hinge on a robust normal-form for the logic and constructive translations between transformers and logical formulas, with explicit constructions and complexity analyses. By showing a strict relationship to uniform \\\\mathsf{TC}^0\\ and SSCMs, the work clarifies the computational boundaries of transformer architectures and lays groundwork for future complete characterizations and architecture variants.

Abstract

Characterizing neural networks in terms of better-understood formal systems has the potential to yield new insights into the power and limitations of these networks. Doing so for transformers remains an active area of research. Bhattamishra and others have shown that transformer encoders are at least as expressive as a certain kind of counter machine, while Merrill and Sabharwal have shown that fixed-precision transformer encoders recognize only languages in uniform $TC^0$. We connect and strengthen these results by identifying a variant of first-order logic with counting quantifiers that is simultaneously an upper bound for fixed-precision transformer encoders and a lower bound for transformer encoders. This brings us much closer than before to an exact characterization of the languages that transformer encoders recognize.

Tighter Bounds on the Expressivity of Transformer Encoders

TL;DR

Abstract

. We connect and strengthen these results by identifying a variant of first-order logic with counting quantifiers that is simultaneously an upper bound for fixed-precision transformer encoders and a lower bound for transformer encoders. This brings us much closer than before to an exact characterization of the languages that transformer encoders recognize.

Paper Structure (50 sections, 22 theorems, 47 equations, 1 figure)

This paper contains 50 sections, 22 theorems, 47 equations, 1 figure.

Introduction
Preliminaries
Transformers
Input layer
Hidden layers
Stacks, encoders and classifiers
First-Order Logic with Counting Quantifiers
Examples
Definition
Normal form
Proof of \ref{['thm:normal_form']}
Case $\exists x. \phi$:
Case $\exists^{{}=x} p. \phi$:
From Transformers to $\mathsf{FOC}[\mathord+;\mathsf{MOD}]$
Representing numbers
...and 35 more sections

Key Result

Theorem 1

Every formula $\phi$ of $\mathsf{FOC}[\mathord+;\mathsf{MOD}]$ is equivalent to a formula of the form where

Figures (1)

Figure 1: Overview of results. Arrows indicate inclusion, and thick arrows indicate strict inclusion. We show that $\mathsf{FOC}[\mathord+;\mathsf{MOD}]$ is simultaneously a tighter upper bound on fixed-precision transformer encoders than uniform $\mathsf{TC}^0$ is, and a tighter lower bound on transformer encoders than SSCMs are.

Theorems & Definitions (54)

Definition 1
Definition 2
Definition 3
Definition 4
Definition 5
Definition 6
Definition 7
Definition 8
Definition 9
Theorem 1
...and 44 more

Tighter Bounds on the Expressivity of Transformer Encoders

TL;DR

Abstract

Tighter Bounds on the Expressivity of Transformer Encoders

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (54)