Transformers in Uniform TC$^0$

David Chiang

Transformers in Uniform TC$^0$

David Chiang

TL;DR

This work investigates the fine-grained computational limits of transformer encoders within the circuit class $\mathsf{TC}^0$. It shows that average-hard attention (AHAT) transformers with no approximation, softmax-attention (SMAT) transformers using $O(\mathrm{poly}(n))$-bit precision, and SMAT transformers with absolute error $2^{-O(\mathrm{poly}(n))}$ all reside in $\mathsf{DLOGTIME}$-uniform $\mathsf{TC}^0$, extending prior results that relied on $O(\log n)$-bit precision. The authors achieve this by (i) encoding rational weights for AHATs and proving TC^0-computability of needed arithmetic, (ii) formalizing $p$-bit floating-point SMATs and proving TC^0-level implementability of floating-point operations, and (iii) introducing an error-control framework that preserves exactness or bounded absolute error within TC^0. These results strengthen the theoretical boundary on transformer expressivity, suggesting that even very precise or exact transformer computations do not escape $\mathsf{TC}^0$, with implications for the interpretability of transformer capabilities and for designing precision-aware theoretical analyses. The work also offers a practical perspective by proposing a margin-based definition of SMAT expressivity that aligns with TC^0 recognizability under tight error bounds.

Abstract

Previous work has shown that the languages recognized by average-hard attention transformers (AHATs) and softmax-attention transformers (SMATs) are within the circuit complexity class TC$^0$. However, these results assume limited-precision arithmetic: using floating-point numbers with O(log n) bits (where n is the length of the input string), Strobl showed that AHATs can be approximated in L-uniform TC$^0$, and Merrill and Sabharwal showed that SMATs can be approximated in DLOGTIME-uniform TC$^0$. Here, we improve these results, showing that AHATs with no approximation, SMATs with O(poly(n)) bits of floating-point precision, and SMATs with at most $2^{-O(poly(n))}$ absolute error are all in DLOGTIME-uniform TC$^0$.

Transformers in Uniform TC$^0$

TL;DR

This work investigates the fine-grained computational limits of transformer encoders within the circuit class

. It shows that average-hard attention (AHAT) transformers with no approximation, softmax-attention (SMAT) transformers using

-bit precision, and SMAT transformers with absolute error

all reside in

-uniform

, extending prior results that relied on

-bit precision. The authors achieve this by (i) encoding rational weights for AHATs and proving TC^0-computability of needed arithmetic, (ii) formalizing

-bit floating-point SMATs and proving TC^0-level implementability of floating-point operations, and (iii) introducing an error-control framework that preserves exactness or bounded absolute error within TC^0. These results strengthen the theoretical boundary on transformer expressivity, suggesting that even very precise or exact transformer computations do not escape

, with implications for the interpretability of transformer capabilities and for designing precision-aware theoretical analyses. The work also offers a practical perspective by proposing a margin-based definition of SMAT expressivity that aligns with TC^0 recognizability under tight error bounds.

Abstract

Previous work has shown that the languages recognized by average-hard attention transformers (AHATs) and softmax-attention transformers (SMATs) are within the circuit complexity class TC

. However, these results assume limited-precision arithmetic: using floating-point numbers with O(log n) bits (where n is the length of the input string), Strobl showed that AHATs can be approximated in L-uniform TC

, and Merrill and Sabharwal showed that SMATs can be approximated in DLOGTIME-uniform TC

. Here, we improve these results, showing that AHATs with no approximation, SMATs with O(poly(n)) bits of floating-point precision, and SMATs with at most

absolute error are all in DLOGTIME-uniform TC

Paper Structure (9 sections, 10 theorems, 18 equations, 2 figures, 1 table)

This paper contains 9 sections, 10 theorems, 18 equations, 2 figures, 1 table.

Introduction
Background
Transformers
Complexity classes
Approximation error
Arbitrary-precision $\mathsf{AHAT}$s
Polynomial-precision $\mathsf{SMAT}$s
Approximating $\mathsf{SMAT}$s with $2^{-O(\mathsf{poly}(n))}$ error
Limitations and Conclusions

Key Result

Theorem 2

The following operations on $O(\mathsf{poly}(n))$ bit integers are in $\mathsf{TC}^0$:

Figures (2)

Figure 1: Overview of algorithm for iterated addition of $p$-bit floating-point numbers. The summands are grouped into blocks that each span $O(\mathsf{poly}(n))$ bits. They are separated by at least $p+\lceil\log_2 n\rceil$ bits, so that the block-sums are separated by at least $p$ bits.
Figure 2: In Case 2, $s^{(1)}$ is a breakpoint, so the sum $s$ depends on the sign (and only the sign) of $s^{(2)}$. In Case 3, even if $m^{(1)}$ has only a single bit, the remaining block-sums do not affect the whole sum.

Theorems & Definitions (26)

Definition 1
Theorem 2
proof
Definition 3
Definition 4
Lemma 5
proof
Lemma 6
proof
Theorem 7
...and 16 more

Transformers in Uniform TC$^0$

TL;DR

Abstract

Transformers in Uniform TC$^0$

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (26)