How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMs
Guhao Feng, Kai Yang, Yuntian Gu, Xinyue Ai, Shengjie Luo, Jiacheng Sun, Di He, Zhenguo Li, Liwei Wang
TL;DR
The paper analyzes how numerical precision affects arithmetical reasoning in autoregressive Transformers, focusing on three base-$p$ tasks: $\mathrm{ADD}_p(n)$, $\mathrm{IterADD}_p(n,k)$, and $\mathrm{MUL}_p(n,l)$. It provides a dichotomy: constant-precision Transformers incur AC$^0$-level limitations, requiring super-polynomial size to handle $\mathrm{IterADD}_p(n,k)$ and $\mathrm{MUL}_p(n,l)$, while logarithmic/standard precision dramatically increases expressiveness, enabling $\mathrm{ADD}_p(n)$ and $\mathrm{IterADD}_p(n,k)$ with constant-depth/constant-dimension architectures and $\mathrm{MUL}_p(n,l)$ with polynomial-size hidden layers; the upper bound under logarithmic precision is $\mathsf{TC}^0$. The authors corroborate theory with experiments across base-2 and base-10 inputs, showing precision degradation disproportionately harms more complex arithmetic tasks, and additional LLAMA-based experiments with LoRA/QLoRA further confirm precision as a key factor. Overall, the work highlights numerical precision as a crucial design consideration for robust mathematical reasoning in LLMs, linking practical performance to foundational circuit complexity classes $\mathsf{AC}^0$ and $\mathsf{TC}^0$.
Abstract
Despite the remarkable success of Transformer-based large language models (LLMs) across various domains, understanding and enhancing their mathematical capabilities remains a significant challenge. In this paper, we conduct a rigorous theoretical analysis of LLMs' mathematical abilities, with a specific focus on their arithmetic performances. We identify numerical precision as a key factor that influences their effectiveness in arithmetical tasks. Our results show that Transformers operating with low numerical precision fail to address arithmetic tasks, such as iterated addition and integer multiplication, unless the model size grows super-polynomially with respect to the input length. In contrast, Transformers with standard numerical precision can efficiently handle these tasks with significantly smaller model sizes. We further support our theoretical findings through empirical experiments that explore the impact of varying numerical precision on arithmetic tasks, providing valuable insights for improving the mathematical reasoning capabilities of LLMs.
