Table of Contents
Fetching ...

Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles

Antara Raaghavi Bhattacharya, Isabel Papadimitriou, Kathryn Davidson, David Alvarez-Melis

TL;DR

The study interrogates how language models merge linguistic and mathematical reasoning across multilingual numeral systems by systematically isolating linguistic versus mathematical components. It employs a controlled set of experiments that manipulate explicitness of operators, provide contextual information, and conduct ablations on minimal-pair numeral parameters, using data from linguistics Olympiads and multiple LLMs. The key finding is that models cannot consistently solve linguistic-mathematical puzzles unless the mathematical operations are explicitly indicated with familiar symbols like $+$ and $\times$, suggesting a gap in inferring implicit numeral structure. This work highlights the limits of current reasoning models in cross-domain adaptability and underscores the need for approaches that enable flexible inference of compositional rules from implicit patterns in human-language data.

Abstract

Across languages, numeral systems vary widely in how they construct and combine numbers. While humans consistently learn to navigate this diversity, large language models (LLMs) struggle with linguistic-mathematical puzzles involving cross-linguistic numeral systems, which humans can learn to solve successfully. We investigate why this task is difficult for LLMs through a series of experiments that untangle the linguistic and mathematical aspects of numbers in language. Our experiments establish that models cannot consistently solve such problems unless the mathematical operations in the problems are explicitly marked using known symbols ($+$, $\times$, etc., as in "twenty + three"). In further ablation studies, we probe how individual parameters of numeral construction and combination affect performance. While humans use their linguistic understanding of numbers to make inferences about the implicit compositional structure of numerals, LLMs seem to lack this notion of implicit numeral structure. We conclude that the ability to flexibly infer compositional rules from implicit patterns in human-scale data remains an open challenge for current reasoning models.

Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles

TL;DR

The study interrogates how language models merge linguistic and mathematical reasoning across multilingual numeral systems by systematically isolating linguistic versus mathematical components. It employs a controlled set of experiments that manipulate explicitness of operators, provide contextual information, and conduct ablations on minimal-pair numeral parameters, using data from linguistics Olympiads and multiple LLMs. The key finding is that models cannot consistently solve linguistic-mathematical puzzles unless the mathematical operations are explicitly indicated with familiar symbols like and , suggesting a gap in inferring implicit numeral structure. This work highlights the limits of current reasoning models in cross-domain adaptability and underscores the need for approaches that enable flexible inference of compositional rules from implicit patterns in human-language data.

Abstract

Across languages, numeral systems vary widely in how they construct and combine numbers. While humans consistently learn to navigate this diversity, large language models (LLMs) struggle with linguistic-mathematical puzzles involving cross-linguistic numeral systems, which humans can learn to solve successfully. We investigate why this task is difficult for LLMs through a series of experiments that untangle the linguistic and mathematical aspects of numbers in language. Our experiments establish that models cannot consistently solve such problems unless the mathematical operations in the problems are explicitly marked using known symbols (, , etc., as in "twenty + three"). In further ablation studies, we probe how individual parameters of numeral construction and combination affect performance. While humans use their linguistic understanding of numbers to make inferences about the implicit compositional structure of numerals, LLMs seem to lack this notion of implicit numeral structure. We conclude that the ability to flexibly infer compositional rules from implicit patterns in human-scale data remains an open challenge for current reasoning models.

Paper Structure

This paper contains 15 sections, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Making operators explicit significantly improves performance. Results for explicit operator experiments, for the single-character variable case (For the results on multi-character variables, see \ref{['app:multi_char']}\ref{['fig:multitok_delta']}). Making operators explicit shows performance improvement over the implicit condition, but this is only substantially and reliably the case when the operator is made explicit with a familiar symbol like "+". Error bars denote standard error of the mean. 10 problems, 5 iterations per problem.
  • Figure 2: Language and base information only helps in the implicit case. Effect of adding language or numeral base information, plotted as a difference from the baseline values in \ref{['fig:1']} for o1-mini. In cases with explicit operators, conflating overtly mathematical and linguistic information appears to confuse the models.
  • Figure 3: Extra information improves performance on implicit problems (A B). Information about implicitness is helpful, but not as much as more direct information like the problem language. Error bars denote standard error of the mean. 5 iterations / problem.
  • Figure 4: Example of full minimal pair template problem, for the Order parameter, where we varied whether digits are read left-to-right or right-to-left.
  • Figure 5: Drehu (IOL 2010) problem
  • ...and 3 more figures