Table of Contents
Fetching ...

What is a Number, That a Large Language Model May Know It?

Raja Marjieh, Veniamin Veselovsky, Thomas L. Griffiths, Ilia Sucholutsky

TL;DR

The paper examines how large language models (LLMs) represent numbers when digit tokens can function as either numeric values or strings, a duality that creates polysemy-like ambiguity. It introduces a cognitive-science–inspired similarity-judgment protocol to map numbers into a joint space across six models, quantifying the extent to which observed similarities align with Levenshtein distance $d_{Lev}$ and Log-Linear distance $d_{Log}$. The authors find an entangled representation that mixes string-like and numerical structure, with context cues such as int() vs str() able to bias the balance, and with internal embedding probes revealing partial separation of the two subspaces. A realistic, triplet-based decision task demonstrates that string bias can influence behavior, particularly for longer numbers, highlighting a fundamental tension in transformer numeracy. The work suggests concrete directions for mitigating such biases and underscores the need to understand numeral representations beyond purely arithmetic contexts.

Abstract

Numbers are a basic part of how humans represent and describe the world around them. As a consequence, learning effective representations of numbers is critical for the success of large language models as they become more integrated into everyday decisions. However, these models face a challenge: depending on context, the same sequence of digit tokens, e.g., 911, can be treated as a number or as a string. What kind of representations arise from this duality, and what are its downstream implications? Using a similarity-based prompting technique from cognitive science, we show that LLMs learn representational spaces that blend string-like and numerical representations. In particular, we show that elicited similarity judgments from these models over integer pairs can be captured by a combination of Levenshtein edit distance and numerical Log-Linear distance, suggesting an entangled representation. In a series of experiments we show how this entanglement is reflected in the latent embeddings, how it can be reduced but not entirely eliminated by context, and how it can propagate into a realistic decision scenario. These results shed light on a representational tension in transformer models that must learn what a number is from text input.

What is a Number, That a Large Language Model May Know It?

TL;DR

The paper examines how large language models (LLMs) represent numbers when digit tokens can function as either numeric values or strings, a duality that creates polysemy-like ambiguity. It introduces a cognitive-science–inspired similarity-judgment protocol to map numbers into a joint space across six models, quantifying the extent to which observed similarities align with Levenshtein distance and Log-Linear distance . The authors find an entangled representation that mixes string-like and numerical structure, with context cues such as int() vs str() able to bias the balance, and with internal embedding probes revealing partial separation of the two subspaces. A realistic, triplet-based decision task demonstrates that string bias can influence behavior, particularly for longer numbers, highlighting a fundamental tension in transformer numeracy. The work suggests concrete directions for mitigating such biases and underscores the need to understand numeral representations beyond purely arithmetic contexts.

Abstract

Numbers are a basic part of how humans represent and describe the world around them. As a consequence, learning effective representations of numbers is critical for the success of large language models as they become more integrated into everyday decisions. However, these models face a challenge: depending on context, the same sequence of digit tokens, e.g., 911, can be treated as a number or as a string. What kind of representations arise from this duality, and what are its downstream implications? Using a similarity-based prompting technique from cognitive science, we show that LLMs learn representational spaces that blend string-like and numerical representations. In particular, we show that elicited similarity judgments from these models over integer pairs can be captured by a combination of Levenshtein edit distance and numerical Log-Linear distance, suggesting an entangled representation. In a series of experiments we show how this entanglement is reflected in the latent embeddings, how it can be reduced but not entirely eliminated by context, and how it can propagate into a realistic decision scenario. These results shed light on a representational tension in transformer models that must learn what a number is from text input.

Paper Structure

This paper contains 28 sections, 3 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: LLM number similarity matrices (symmetrized) over all integer pairs in the range $0-999$, along with two theoretical similarity matrices derived from a Levenshtein string edit distance and a psychological Log-Linear numerical distance (highlighted in black).
  • Figure 2: Context effects on LLM-elicited number similarity matrices and their decomposition. A. LLM similarity matrices under the effect of 'type' specification: int() vs. str() (see Appendix \ref{['app:prompts']} for prompts). B. Coefficient of determination ($R^2$) for the different similarity matrices under the default (Figure \ref{['fig:default-sim']}), int(), and str() contexts for the combined and separate Levenshtein (string) and Log-Linear (numerical) distance predictors (error bars indicate 95% confidence intervals; see Methodology).
  • Figure 3: The effect of other number bases on elicited similarity. A. LLM similarity matrices over all integer pairs in the range $0-999$ represented in base 4 and 8 along with the corresponding Levenshtein distance measures (see Appendix \ref{['app:prompts']} for prompts). B. Coefficient of determination ($R^2$) for the various similarity matrices under the different base contexts (including the base 10 results from Figure \ref{['fig:default-sim']}) for the combined and separate Levenshtein (string) and Log-Linear (numerical) distance predictors (error bars indicate 95% CIs).
  • Figure 4: Decoded string and integer subspaces from Llama-3.1-8b using linear probes (see Methodology). The decoded similarity matrices are provided as insets along with their multidimensional scaling solutions (MDS). Integers are labeled every 5 points.
  • Figure 5: Probing string-bias in a naturalistic decision scenario. Bar plots indicate the faction of times an incorrect (Levenshtein-aligned) option was chosen for the 3-digit and 5-digit scenarios considered, and the two possible presentation orders (see Methodology). '(Rev.)' indicates the case in which the Levenshtein-aligned option was presented second (see prompt in Appendix \ref{['app:prompts']}).
  • ...and 3 more figures