Table of Contents
Fetching ...

Unravelling the Mechanisms of Manipulating Numbers in Language Models

Michal Štefánik, Timothee Mickus, Marek Kadlčík, Bertram Højer, Michal Spiegel, Raúl Vázquez, Aman Sinha, Josef Kuchař, Philipp Mondorf

TL;DR

The paper addresses why LLMs struggle with numeric accuracy by revealing that numbers are encoded in universal sinusoidal representations across models and layers. It introduces robust, sinusoidal probes and representational analyses (RSA and Fourier-based) to extract numeric values and trace errors to specific layers, demonstrating cross-model and cross-context consistency. The authors show that multi-token numbers are systematically superposed and that many arithmetic errors can be attributed to particular layers, suggesting targeted avenues for architectural and probing improvements. This work advances interpretability and robustness for numeric reasoning in LLMs by linking representation structure to error sources and highlighting the value of natural-language probes.

Abstract

Recent work has shown that different large language models (LLMs) converge to similar and accurate input embedding representations for numbers. These findings conflict with the documented propensity of LLMs to produce erroneous outputs when dealing with numeric information. In this work, we aim to explain this conflict by exploring how language models manipulate numbers and quantify the lower bounds of accuracy of these mechanisms. We find that despite surfacing errors, different language models learn interchangeable representations of numbers that are systematic, highly accurate and universal across their hidden states and the types of input contexts. This allows us to create universal probes for each LLM and to trace information -- including the causes of output errors -- to specific layers. Our results lay a fundamental understanding of how pre-trained LLMs manipulate numbers and outline the potential of more accurate probing techniques in addressed refinements of LLMs' architectures.

Unravelling the Mechanisms of Manipulating Numbers in Language Models

TL;DR

The paper addresses why LLMs struggle with numeric accuracy by revealing that numbers are encoded in universal sinusoidal representations across models and layers. It introduces robust, sinusoidal probes and representational analyses (RSA and Fourier-based) to extract numeric values and trace errors to specific layers, demonstrating cross-model and cross-context consistency. The authors show that multi-token numbers are systematically superposed and that many arithmetic errors can be attributed to particular layers, suggesting targeted avenues for architectural and probing improvements. This work advances interpretability and robustness for numeric reasoning in LLMs by linking representation structure to error sources and highlighting the value of natural-language probes.

Abstract

Recent work has shown that different large language models (LLMs) converge to similar and accurate input embedding representations for numbers. These findings conflict with the documented propensity of LLMs to produce erroneous outputs when dealing with numeric information. In this work, we aim to explain this conflict by exploring how language models manipulate numbers and quantify the lower bounds of accuracy of these mechanisms. We find that despite surfacing errors, different language models learn interchangeable representations of numbers that are systematic, highly accurate and universal across their hidden states and the types of input contexts. This allows us to create universal probes for each LLM and to trace information -- including the causes of output errors -- to specific layers. Our results lay a fundamental understanding of how pre-trained LLMs manipulate numbers and outline the potential of more accurate probing techniques in addressed refinements of LLMs' architectures.

Paper Structure

This paper contains 18 sections, 1 equation, 16 figures, 2 tables.

Figures (16)

  • Figure 1: Representational similarity analysis (RSA) scores
  • Figure 2: Intersection-over-union of top $k=63$ Fourier base frequencies
  • Figure 3: Accuracy of decoding numeric input token from internal activations of language models
  • Figure 4: Generalization of probes fitted on natural-language occurrences of numeric tokens (solid line), and synthetic, mathematical contexts (dashed line).
  • Figure 5: Probes accuracy on activations from unseen layers.
  • ...and 11 more figures