Table of Contents
Fetching ...

Language Models Do Not Embed Numbers Continuously

Alex O. Davies, Roussel Nzoyem, Nirav Ajmeri, Telmo M. Silva Filho

TL;DR

The paper investigates whether language-model embeddings treat continuous numbers as a genuine one-dimensional continuum or as a discretized, noisy space. It introduces a model-agnostic evaluation framework using linear $R^2$, PCA correlation, and explained variance to quantify numerical fidelity, and tests across OpenAI, Google Gemini, and Voyage AI. Findings show high linear reconstructability ($R^2$ ≥ 0.95) but low variance explained by the first PCA component, with fidelity deteriorating as decimal precision increases, revealing substantial non-continuous structure and noise. These results highlight limitations of current numeric embeddings for precision-heavy tasks and motivate development of numerically specialized architectures or denoising strategies.

Abstract

Recent research has extensively studied how large language models manipulate integers in specific arithmetic tasks, and on a more fundamental level, how they represent numeric values. These previous works have found that language model embeddings can be used to reconstruct the original values, however, they do not evaluate whether language models actually model continuous values as continuous. Using expected properties of the embedding space, including linear reconstruction and principal component analysis, we show that language models not only represent numeric spaces as non-continuous but also introduce significant noise. Using models from three major providers (OpenAI, Google Gemini and Voyage AI), we show that while reconstruction is possible with high fidelity ($R^2 \geq 0.95$), principal components only explain a minor share of variation within the embedding space. This indicates that many components within the embedding space are orthogonal to the simple numeric input space. Further, both linear reconstruction and explained variance suffer with increasing decimal precision, despite the ordinal nature of the input space being fundamentally unchanged. The findings of this work therefore have implications for the many areas where embedding models are used, in-particular where high numerical precision, large magnitudes or mixed-sign values are common.

Language Models Do Not Embed Numbers Continuously

TL;DR

The paper investigates whether language-model embeddings treat continuous numbers as a genuine one-dimensional continuum or as a discretized, noisy space. It introduces a model-agnostic evaluation framework using linear , PCA correlation, and explained variance to quantify numerical fidelity, and tests across OpenAI, Google Gemini, and Voyage AI. Findings show high linear reconstructability ( ≥ 0.95) but low variance explained by the first PCA component, with fidelity deteriorating as decimal precision increases, revealing substantial non-continuous structure and noise. These results highlight limitations of current numeric embeddings for precision-heavy tasks and motivate development of numerically specialized architectures or denoising strategies.

Abstract

Recent research has extensively studied how large language models manipulate integers in specific arithmetic tasks, and on a more fundamental level, how they represent numeric values. These previous works have found that language model embeddings can be used to reconstruct the original values, however, they do not evaluate whether language models actually model continuous values as continuous. Using expected properties of the embedding space, including linear reconstruction and principal component analysis, we show that language models not only represent numeric spaces as non-continuous but also introduce significant noise. Using models from three major providers (OpenAI, Google Gemini and Voyage AI), we show that while reconstruction is possible with high fidelity (), principal components only explain a minor share of variation within the embedding space. This indicates that many components within the embedding space are orthogonal to the simple numeric input space. Further, both linear reconstruction and explained variance suffer with increasing decimal precision, despite the ordinal nature of the input space being fundamentally unchanged. The findings of this work therefore have implications for the many areas where embedding models are used, in-particular where high numerical precision, large magnitudes or mixed-sign values are common.

Paper Structure

This paper contains 13 sections, 3 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Visualisations of our examples for LLM embeddings in scientific knowledge applications. Top: In material science, the magnitude of a concentration is crucial, but repeated mantissas in numerical representations could cause incorrect retrieval. Bottom: In astronomy, a negatively signed value may not indicate it is semantically opposite to its positive counterpart, such as in measuring velocities within galactic disks.
  • Figure 2: Framework for measuring numerical embedding quality. Scalars are embedded into high-dimensional space and evaluated using linear reconstruction and PCA to quantify preservation of numerical structure through three complementary metrics, defined in Equations (\ref{['eqn:linear-corr']},\ref{['eqn:expl-pca']},\ref{['eqn:corr-pca']}).
  • Figure 3: Decimal precision for each dataset plotted against the $R^2$ score of the linear model reconstructing the original scalars $X$ from their embedded counterparts $\hat{X}$.
  • Figure 6: Visualisations of the first two principal components of embeddings of the integers $x \in [0,1000]$ (left) and $x \in [-1000,1000]$ (right) for the main OpenAI, Voyage and Gemini models. '$\times$' symbols mark $x=0$.
  • Figure 7: PCA over randomly sampled $x \in [0,10\textrm{k}], |X|=1000$ (left) and $x \in [-10\textrm{k},10\textrm{k}], |X|=2000$ (right) for the main OpenAI, Voyage and Gemini models.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Example 1: Climate Science
  • Example 2: Drug Discovery
  • Example 3: Astronomy