Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
Alireza Amiri, Xinting Huang, Mark Rofin, Michael Hahn
TL;DR
This work establishes unconditional lower bounds on the length of chain-of-thought reasoning in hard-attention (UHAT) transformers across several algorithmic tasks, proving that CoTs must grow at least linearly with input size in many cases. Using random-restriction methods and a depth-reduction argument, the authors show that sublinear CoTs would force constant outputs on large input subsets, which is incompatible with parity-like functions. They apply these generic bounds to concrete problems—Parity, Multiplication, Median, and Reachability—demonstrating Omega(N) or Omega(N log N) CoT requirements, and show these bounds are tight up to polylog factors; they also analyze the limitations of dot-by-dot CoTs. Complementing theory with experiments on synthetic transformers and pretrained LLMs, the paper argues that sublinear CoTs are unlikely to suffice for the investigated tasks, motivating tool use or architectural innovations for efficient reasoning. Overall, the results delineate the power and limits of CoT reasoning in transformers and quantify the inference-time compute necessary for robust algorithmic reasoning.
Abstract
Chain-of-thought reasoning and scratchpads have emerged as critical tools for enhancing the computational capabilities of transformers. While theoretical results show that polynomial-length scratchpads can extend transformers' expressivity from $TC^0$ to $PTIME$, their required length remains poorly understood. Empirical evidence even suggests that transformers need scratchpads even for many problems in $TC^0$, such as Parity or Multiplication, challenging optimistic bounds derived from circuit complexity. In this work, we initiate the study of systematic lower bounds for the number of chain-of-thought steps across different algorithmic problems, in the hard-attention regime. We study a variety of algorithmic problems, and provide bounds that are tight up to logarithmic factors. Overall, these results contribute to emerging understanding of the power and limitations of chain-of-thought reasoning.
