Table of Contents
Fetching ...

How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMs

Guhao Feng, Kai Yang, Yuntian Gu, Xinyue Ai, Shengjie Luo, Jiacheng Sun, Di He, Zhenguo Li, Liwei Wang

TL;DR

The paper analyzes how numerical precision affects arithmetical reasoning in autoregressive Transformers, focusing on three base-$p$ tasks: $\mathrm{ADD}_p(n)$, $\mathrm{IterADD}_p(n,k)$, and $\mathrm{MUL}_p(n,l)$. It provides a dichotomy: constant-precision Transformers incur AC$^0$-level limitations, requiring super-polynomial size to handle $\mathrm{IterADD}_p(n,k)$ and $\mathrm{MUL}_p(n,l)$, while logarithmic/standard precision dramatically increases expressiveness, enabling $\mathrm{ADD}_p(n)$ and $\mathrm{IterADD}_p(n,k)$ with constant-depth/constant-dimension architectures and $\mathrm{MUL}_p(n,l)$ with polynomial-size hidden layers; the upper bound under logarithmic precision is $\mathsf{TC}^0$. The authors corroborate theory with experiments across base-2 and base-10 inputs, showing precision degradation disproportionately harms more complex arithmetic tasks, and additional LLAMA-based experiments with LoRA/QLoRA further confirm precision as a key factor. Overall, the work highlights numerical precision as a crucial design consideration for robust mathematical reasoning in LLMs, linking practical performance to foundational circuit complexity classes $\mathsf{AC}^0$ and $\mathsf{TC}^0$.

Abstract

Despite the remarkable success of Transformer-based large language models (LLMs) across various domains, understanding and enhancing their mathematical capabilities remains a significant challenge. In this paper, we conduct a rigorous theoretical analysis of LLMs' mathematical abilities, with a specific focus on their arithmetic performances. We identify numerical precision as a key factor that influences their effectiveness in arithmetical tasks. Our results show that Transformers operating with low numerical precision fail to address arithmetic tasks, such as iterated addition and integer multiplication, unless the model size grows super-polynomially with respect to the input length. In contrast, Transformers with standard numerical precision can efficiently handle these tasks with significantly smaller model sizes. We further support our theoretical findings through empirical experiments that explore the impact of varying numerical precision on arithmetic tasks, providing valuable insights for improving the mathematical reasoning capabilities of LLMs.

How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMs

TL;DR

The paper analyzes how numerical precision affects arithmetical reasoning in autoregressive Transformers, focusing on three base- tasks: , , and . It provides a dichotomy: constant-precision Transformers incur AC-level limitations, requiring super-polynomial size to handle and , while logarithmic/standard precision dramatically increases expressiveness, enabling and with constant-depth/constant-dimension architectures and with polynomial-size hidden layers; the upper bound under logarithmic precision is . The authors corroborate theory with experiments across base-2 and base-10 inputs, showing precision degradation disproportionately harms more complex arithmetic tasks, and additional LLAMA-based experiments with LoRA/QLoRA further confirm precision as a key factor. Overall, the work highlights numerical precision as a crucial design consideration for robust mathematical reasoning in LLMs, linking practical performance to foundational circuit complexity classes and .

Abstract

Despite the remarkable success of Transformer-based large language models (LLMs) across various domains, understanding and enhancing their mathematical capabilities remains a significant challenge. In this paper, we conduct a rigorous theoretical analysis of LLMs' mathematical abilities, with a specific focus on their arithmetic performances. We identify numerical precision as a key factor that influences their effectiveness in arithmetical tasks. Our results show that Transformers operating with low numerical precision fail to address arithmetic tasks, such as iterated addition and integer multiplication, unless the model size grows super-polynomially with respect to the input length. In contrast, Transformers with standard numerical precision can efficiently handle these tasks with significantly smaller model sizes. We further support our theoretical findings through empirical experiments that explore the impact of varying numerical precision on arithmetic tasks, providing valuable insights for improving the mathematical reasoning capabilities of LLMs.

Paper Structure

This paper contains 43 sections, 24 theorems, 58 equations, 4 figures, 9 tables, 5 algorithms.

Key Result

Theorem 4.1

Fix integers $p\geq 2$ and $c\in \mathbb{N}^*$. Consider the tokenizer ${\bm{T}}_c$ defined in eq:tokenizer for processing the input and output sequences. There exist constant-precision Transformers with constant depth (independent of $n$) and hidden dimension $d = O(n^2)$ that can solve the $\opera

Figures (4)

  • Figure 1: Examples for three elementary arithmetic tasks we consider in this paper: integer addition, iterated addition, and integer multiplication.
  • Figure 2: Model performance on different tasks in base-2. Within each sub-figure, the x-axis represents the maximum digits length and the y-axis represents the accuracy gained by each model. The figure indicates that, for all tasks, Transformers utilizing float32 with 3 layers and 5 layers outperform their bfloat16 counterparts.
  • Figure 3: Model performance on iterated addition tasks involving three numbers and integer multiplication tasks. Each sub-figure presents a comparison of the performance between float32 and bfloat16.
  • Figure 4: The performance of LLAMA-3.1-8B Instruct model on arithmetic tasks in base-10. In each sub-figure, we compare the original model in bfloat16 and the quantized model in int4, alongside fine-tuned models, with LoRA using bfloat16 and QLoRA using int4.

Theorems & Definitions (43)

  • Remark 3.1
  • Theorem 4.1
  • Theorem 4.2
  • Theorem 4.3
  • Theorem 5.1
  • Theorem 5.2
  • Theorem 5.3
  • Definition B.1
  • Definition B.2: Constant-Precision Transformer
  • Definition B.3: Logarithmic-Precision Transformer
  • ...and 33 more