Table of Contents
Fetching ...

Counting Like Transformers: Compiling Temporal Counting Logic Into Softmax Transformers

Andy Yang, David Chiang

TL;DR

The paper develops a formal framework to bound and understand the expressivity of future-masked soft attention transformers by introducing the temporal counting logic $K_{\text{t}}[\#]$ and a RASP variant $\textsf{C-RASP}$, proving their equivalence. It demonstrates that any $K_{\text{t}}[\#]$ formula can be compiled into a standard, future-masked SMAT encoder, establishing a tight lower bound on SMAT expressivity for unbounded input length. It also shows that $K_{\text{t}}[\#]$ subsumes prior lower bounds like $FOC[+;\textsf{MOD}]$ and that fixed-precision transformers can be simulated within this logic, bridging theory with practical transformer implementations. The work provides constructive methods for counting and arithmetic through masked uniform attention and FFNs, enabling interpretable compilation of known algorithms into SMATs and offering a foundation for comparing the expressive power of hard vs soft attention models. Overall, it paves the way for deeper theoretical analysis and practical exploration of transformer capabilities through a precise logical lens.

Abstract

Deriving formal bounds on the expressivity of transformers, as well as studying transformers that are constructed to implement known algorithms, are both effective methods for better understanding the computational power of transformers. Towards both ends, we introduce the temporal counting logic $\textsf{K}_\text{t}$[#] alongside the RASP variant $\textsf{C-RASP}$. We show they are equivalent to each other, and that together they are the best-known lower bound on the formal expressivity of future-masked soft attention transformers with unbounded input size. We prove this by showing all $\textsf{K}_\text{t}$[#] formulas can be compiled into these transformers.

Counting Like Transformers: Compiling Temporal Counting Logic Into Softmax Transformers

TL;DR

The paper develops a formal framework to bound and understand the expressivity of future-masked soft attention transformers by introducing the temporal counting logic and a RASP variant , proving their equivalence. It demonstrates that any formula can be compiled into a standard, future-masked SMAT encoder, establishing a tight lower bound on SMAT expressivity for unbounded input length. It also shows that subsumes prior lower bounds like and that fixed-precision transformers can be simulated within this logic, bridging theory with practical transformer implementations. The work provides constructive methods for counting and arithmetic through masked uniform attention and FFNs, enabling interpretable compilation of known algorithms into SMATs and offering a foundation for comparing the expressive power of hard vs soft attention models. Overall, it paves the way for deeper theoretical analysis and practical exploration of transformer capabilities through a precise logical lens.

Abstract

Deriving formal bounds on the expressivity of transformers, as well as studying transformers that are constructed to implement known algorithms, are both effective methods for better understanding the computational power of transformers. Towards both ends, we introduce the temporal counting logic [#] alongside the RASP variant . We show they are equivalent to each other, and that together they are the best-known lower bound on the formal expressivity of future-masked soft attention transformers with unbounded input size. We prove this by showing all [#] formulas can be compiled into these transformers.
Paper Structure (40 sections, 18 theorems, 53 equations)

This paper contains 40 sections, 18 theorems, 53 equations.

Key Result

Lemma 4.2

For every string $s$ of length $n$, there exists a formula $\phi_s$ of modal depth $n$ such that $w\vDash \phi_s$ if and only if $w$ contains $s$ as a subsequence.

Theorems & Definitions (53)

  • Definition 3.1: $\textsf{C-RASP}$
  • Example 3.2
  • Definition 4.1
  • Lemma 4.2
  • proof
  • Theorem 4.3
  • proof
  • Definition 5.1
  • Definition 5.2: Transformer Block
  • Lemma 5.3
  • ...and 43 more