Table of Contents
Fetching ...

The Lookahead Limitation: Why Multi-Operand Addition is Hard for LLMs

Tanja Baeumel, Josef van Genabith, Simon Ostermann

TL;DR

This paper investigates why autoregressive LLMs struggle with multi-operand addition, attributing the difficulty to a shallow one-digit lookahead that fails to anticipate cascading carries. Through probing experiments, formalization of left-to-right addition, and controlled multi-operand datasets, the authors show that the carry state is not reliably represented when only a single-digit lookahead is available, and that this limitation persists across tokenization schemes. They introduce a formal carry heuristic H1, demonstrate its predictive power for both two- and multi-operand addition, and provide empirical evidence that model accuracy deteriorates in proportion to the number of operands. The findings reveal a fundamental limitation in current LLMs for complex numerical reasoning and point to deeper lookahead as a promising direction to enhance arithmetic capabilities with practical implications for numerical tasks and algorithmic reasoning.”

Abstract

Autoregressive large language models (LLMs) exhibit impressive performance across various tasks but struggle with simple arithmetic, such as addition of two or more operands. We show that this struggle arises from LLMs' use of a simple one-digit lookahead heuristic, which works fairly well (but not perfect) for two-operand addition but fails in multi-operand cases, where the carry-over logic is more complex. Our probing experiments and digit-wise accuracy evaluation show that LLMs fail precisely where a one-digit lookahead is insufficient to account for cascading carries. We analyze the impact of tokenization strategies on arithmetic performance and show that all investigated models, regardless of tokenization, are inherently limited in the addition of multiple operands due to their reliance on a one-digit lookahead heuristic. Our findings reveal fundamental limitations that prevent LLMs from generalizing to more complex numerical reasoning.

The Lookahead Limitation: Why Multi-Operand Addition is Hard for LLMs

TL;DR

This paper investigates why autoregressive LLMs struggle with multi-operand addition, attributing the difficulty to a shallow one-digit lookahead that fails to anticipate cascading carries. Through probing experiments, formalization of left-to-right addition, and controlled multi-operand datasets, the authors show that the carry state is not reliably represented when only a single-digit lookahead is available, and that this limitation persists across tokenization schemes. They introduce a formal carry heuristic H1, demonstrate its predictive power for both two- and multi-operand addition, and provide empirical evidence that model accuracy deteriorates in proportion to the number of operands. The findings reveal a fundamental limitation in current LLMs for complex numerical reasoning and point to deeper lookahead as a promising direction to enhance arithmetic capabilities with practical implications for numerical tasks and algorithmic reasoning.”

Abstract

Autoregressive large language models (LLMs) exhibit impressive performance across various tasks but struggle with simple arithmetic, such as addition of two or more operands. We show that this struggle arises from LLMs' use of a simple one-digit lookahead heuristic, which works fairly well (but not perfect) for two-operand addition but fails in multi-operand cases, where the carry-over logic is more complex. Our probing experiments and digit-wise accuracy evaluation show that LLMs fail precisely where a one-digit lookahead is insufficient to account for cascading carries. We analyze the impact of tokenization strategies on arithmetic performance and show that all investigated models, regardless of tokenization, are inherently limited in the addition of multiple operands due to their reliance on a one-digit lookahead heuristic. Our findings reveal fundamental limitations that prevent LLMs from generalizing to more complex numerical reasoning.

Paper Structure

This paper contains 40 sections, 25 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: An addition of two three-digit operands. LLMs rely on a one-digit lookahead when performing addition. If a relevant carry emerges at a later stage in prediction, they fail to account for it, leading to errors in earlier generated result digits.
  • Figure 2: Accuracy of Mistral, Gemma and Llama-3 on multi-operand addition of triple-digit numbers, in a zero- and one-shot setting.
  • Figure 3: Probing accuracy of individual result digits as predicted by the hidden states of Mistral, Gemma and Llama-3. For two-operand, zero-shot addition prompts.
  • Figure 4: Two-operand addition in which H1 is successful.
  • Figure 5: Two-operand addition in which H1 fails.
  • ...and 6 more figures