Table of Contents
Fetching ...

Language Models are Symbolic Learners in Arithmetic

Chunyuan Deng, Zhiqi Li, Roy Xie, Ruidi Chang, Hanjie Chen

TL;DR

The paper investigates whether language models truly compute arithmetic or rely on simplistic pattern shortcuts, proposing Subgroup Induction as a practical, Solomonoff-inspired framework. Subgroups define minimal input-to-output mappings with two key metrics, subgroup quality $Q(s)$ and subgroup entropy $H(s)$, to capture shortcut viability and predictive uncertainty. Empirically, LMs exhibit a robust U-shaped position-level accuracy in multi-digit multiplication, explained by availability of high-quality, low-token subgroups for edge digits and the need for larger token budgets to handle middle digits; training dynamics align with a token-budget tree where models progressively unlock more complex subgroups. Extending entropy-based analysis to CoT settings reveals that reasoning paths with lower aggregate $H'(s)$ yield better performance, reinforcing the view that LM arithmetic relies on hierarchies of simple symbolic shortcuts rather than explicit algorithms. The framework provides actionable tools for analyzing arithmetic learning and has implications for reliability, generalization, and the development of reasoning strategies in LMs.

Abstract

The prevailing question in LM performing arithmetic is whether these models learn to truly compute or if they simply master superficial pattern matching. In this paper, we argues for the latter, presenting evidence that LMs act as greedy symbolic learners, prioritizing the simplest possible shortcuts to fit the stats of dataset to solve arithmetic tasks. To investigate this, we introduce subgroup induction, a practical framework adapted from Solomonoff Induction (SI), one of the most powerful universal predictors. Our framework analyzes arithmetic problems by breaking them down into subgroups-minimal mappings between a few input digits and a single output digit. Our primary metric, subgroup quality, measures the viability of these shortcuts. Experiments reveal a distinct U-shaped accuracy pattern in multi-digit multiplication: LMs quickly master the first and last output digits while struggling with those in the middle. We demonstrate this U-shape is not coincidental; it perfectly mirrors the quality of the simplest possible subgroups, those requiring the fewest input tokens. This alignment suggests a core learning mechanism: LMs first learn easy, low-token shortcuts and only incorporate more complex, multi-token patterns as training progresses. They do not learn the algorithm of multiplication but rather a hierarchy of increasingly complex symbol-to-symbol mappings. Ultimately, our findings suggest that the path to arithmetic mastery for LMs is not paved with algorithms, but with a cascade of simple, hierarchically-learned symbolic shortcuts.

Language Models are Symbolic Learners in Arithmetic

TL;DR

The paper investigates whether language models truly compute arithmetic or rely on simplistic pattern shortcuts, proposing Subgroup Induction as a practical, Solomonoff-inspired framework. Subgroups define minimal input-to-output mappings with two key metrics, subgroup quality and subgroup entropy , to capture shortcut viability and predictive uncertainty. Empirically, LMs exhibit a robust U-shaped position-level accuracy in multi-digit multiplication, explained by availability of high-quality, low-token subgroups for edge digits and the need for larger token budgets to handle middle digits; training dynamics align with a token-budget tree where models progressively unlock more complex subgroups. Extending entropy-based analysis to CoT settings reveals that reasoning paths with lower aggregate yield better performance, reinforcing the view that LM arithmetic relies on hierarchies of simple symbolic shortcuts rather than explicit algorithms. The framework provides actionable tools for analyzing arithmetic learning and has implications for reliability, generalization, and the development of reasoning strategies in LMs.

Abstract

The prevailing question in LM performing arithmetic is whether these models learn to truly compute or if they simply master superficial pattern matching. In this paper, we argues for the latter, presenting evidence that LMs act as greedy symbolic learners, prioritizing the simplest possible shortcuts to fit the stats of dataset to solve arithmetic tasks. To investigate this, we introduce subgroup induction, a practical framework adapted from Solomonoff Induction (SI), one of the most powerful universal predictors. Our framework analyzes arithmetic problems by breaking them down into subgroups-minimal mappings between a few input digits and a single output digit. Our primary metric, subgroup quality, measures the viability of these shortcuts. Experiments reveal a distinct U-shaped accuracy pattern in multi-digit multiplication: LMs quickly master the first and last output digits while struggling with those in the middle. We demonstrate this U-shape is not coincidental; it perfectly mirrors the quality of the simplest possible subgroups, those requiring the fewest input tokens. This alignment suggests a core learning mechanism: LMs first learn easy, low-token shortcuts and only incorporate more complex, multi-token patterns as training progresses. They do not learn the algorithm of multiplication but rather a hierarchy of increasingly complex symbol-to-symbol mappings. Ultimately, our findings suggest that the path to arithmetic mastery for LMs is not paved with algorithms, but with a cascade of simple, hierarchically-learned symbolic shortcuts.

Paper Structure

This paper contains 57 sections, 1 theorem, 20 equations, 7 figures, 5 tables, 1 algorithm.

Key Result

Proposition 1.4

The theoretical SI framework and practical LM implementation approach arithmetic learning through parallel mechanisms: 1. SI via Bayesian updating over programs: 2. LMs via gradient descent on negative log-likelihood:

Figures (7)

  • Figure 1: Overview of Subgroup Induction. Solomonoff Induction (SI) is a conceptual framework that utilizes a universal predictor, such as UTMs, to make predictions. Inspired by SI's principle of Occam's Razor, subgroup induction is a pragmatic framework designed to uncover the shortcut-seeking mechanisms LMs use to perform arithmetic.
  • Figure 2: Position-level Accuracy from Gemma-2-2B and Llama-3.1-8B.
  • Figure 3: Tree Structure for 2-digits multiplication. Given an output position $\mathbb{C}$, subgroups can be organized into a hierarchical tree structure. Each layer represents the number of tokens used by the corresponding subgroup. We do not draw lowest layer (e.g., $A_1$) in this figure as its quality equals to 0.
  • Figure 4: Position-level Subgroup Quality. The low-to-high order reflects the hierarchy in the tree structure. Similar trends in $3-5$ digits with $2$-digits (see Appendix \ref{['appendix:quality']}).
  • Figure 5: Searching Program inside LMs. Different phases refer to different stage of LMs fitting.
  • ...and 2 more figures

Theorems & Definitions (7)

  • Definition 1.1: Universal Probability
  • Definition 1.2: Occam's Razor
  • Definition 1.3: Bayesian Updating
  • Proposition 1.4: Parallel Learning Frameworks
  • Definition 2.1: Subgroup
  • Definition 2.2: Subgroup Quality
  • Definition 2.3: Subgroup Entropy