Language Models are Symbolic Learners in Arithmetic
Chunyuan Deng, Zhiqi Li, Roy Xie, Ruidi Chang, Hanjie Chen
TL;DR
The paper investigates whether language models truly compute arithmetic or rely on simplistic pattern shortcuts, proposing Subgroup Induction as a practical, Solomonoff-inspired framework. Subgroups define minimal input-to-output mappings with two key metrics, subgroup quality $Q(s)$ and subgroup entropy $H(s)$, to capture shortcut viability and predictive uncertainty. Empirically, LMs exhibit a robust U-shaped position-level accuracy in multi-digit multiplication, explained by availability of high-quality, low-token subgroups for edge digits and the need for larger token budgets to handle middle digits; training dynamics align with a token-budget tree where models progressively unlock more complex subgroups. Extending entropy-based analysis to CoT settings reveals that reasoning paths with lower aggregate $H'(s)$ yield better performance, reinforcing the view that LM arithmetic relies on hierarchies of simple symbolic shortcuts rather than explicit algorithms. The framework provides actionable tools for analyzing arithmetic learning and has implications for reliability, generalization, and the development of reasoning strategies in LMs.
Abstract
The prevailing question in LM performing arithmetic is whether these models learn to truly compute or if they simply master superficial pattern matching. In this paper, we argues for the latter, presenting evidence that LMs act as greedy symbolic learners, prioritizing the simplest possible shortcuts to fit the stats of dataset to solve arithmetic tasks. To investigate this, we introduce subgroup induction, a practical framework adapted from Solomonoff Induction (SI), one of the most powerful universal predictors. Our framework analyzes arithmetic problems by breaking them down into subgroups-minimal mappings between a few input digits and a single output digit. Our primary metric, subgroup quality, measures the viability of these shortcuts. Experiments reveal a distinct U-shaped accuracy pattern in multi-digit multiplication: LMs quickly master the first and last output digits while struggling with those in the middle. We demonstrate this U-shape is not coincidental; it perfectly mirrors the quality of the simplest possible subgroups, those requiring the fewest input tokens. This alignment suggests a core learning mechanism: LMs first learn easy, low-token shortcuts and only incorporate more complex, multi-token patterns as training progresses. They do not learn the algorithm of multiplication but rather a hierarchy of increasingly complex symbol-to-symbol mappings. Ultimately, our findings suggest that the path to arithmetic mastery for LMs is not paved with algorithms, but with a cascade of simple, hierarchically-learned symbolic shortcuts.
