Table of Contents
Fetching ...

Benford's Curse: Tracing Digit Bias to Numerical Hallucination in LLMs

Jiandong Shao, Yao Lu, Jianfei Yang

TL;DR

This work investigates numerical hallucinations in LLMs through the lens of Benford-like digit distributions in pretraining data. It introduces the Digit Bias Benchmark to disentangle task priors from generation bias and demonstrates that open-source LLMs overgenerate small digits, with the first incorrect digits showing even stronger bias. Layerwise logit-lens analysis and neuron-level scrutiny reveal that a subset of highly digit-selective FFN neurons in late layers encodes preferences aligned with corpus statistics, and pruning these neurons can partially mitigate biased outputs. The results provide causal evidence that fine-grained corpus-level digit biases contribute to numerical hallucination and offer a targeted probing method for diagnosing and counteracting such errors.

Abstract

Large Language Models (LLMs) exhibit impressive performance on complex reasoning tasks, yet they frequently fail on basic numerical problems, producing incorrect outputs. Inspired by Benford's Law, a statistical pattern in which lower digits occur more frequently as leading digits, we hypothesize that the skewed digit distributions in web-collected corpora may be learned by LLMs during pretraining, leading to biased numerical generation. To investigate the hypothesis, we first examine whether digits frequencies in pretraining corpus (OLMo2) follows Benford's law. We then construct an evaluation benchmark in which the ground-truth digits are uniformly distributed within each of the seven numerical reasoning tasks. Our evaluation results demonstrate that leading open-source LLMs show a consistent pattern of digit bias that resembles Benford's law. Through logit-lens tracing and neuron-level dissection, we identify that this bias arises predominantly from a small subset of highly digit-selective feed-forward network (FFN) neurons in the deeper layers. Finally, we demonstrate that pruning these neurons mitigates imbalanced overgeneration and partially corrects erroneous outputs, providing causal evidence that fine-grained pretraining digit bias can propagate into model behavior. Our findings reveal a fundamental connection between corpus-level statistics and symbolic failure modes in LLMs, offering a new lens for diagnosing and mitigating hallucinations in numerical tasks.

Benford's Curse: Tracing Digit Bias to Numerical Hallucination in LLMs

TL;DR

This work investigates numerical hallucinations in LLMs through the lens of Benford-like digit distributions in pretraining data. It introduces the Digit Bias Benchmark to disentangle task priors from generation bias and demonstrates that open-source LLMs overgenerate small digits, with the first incorrect digits showing even stronger bias. Layerwise logit-lens analysis and neuron-level scrutiny reveal that a subset of highly digit-selective FFN neurons in late layers encodes preferences aligned with corpus statistics, and pruning these neurons can partially mitigate biased outputs. The results provide causal evidence that fine-grained corpus-level digit biases contribute to numerical hallucination and offer a targeted probing method for diagnosing and counteracting such errors.

Abstract

Large Language Models (LLMs) exhibit impressive performance on complex reasoning tasks, yet they frequently fail on basic numerical problems, producing incorrect outputs. Inspired by Benford's Law, a statistical pattern in which lower digits occur more frequently as leading digits, we hypothesize that the skewed digit distributions in web-collected corpora may be learned by LLMs during pretraining, leading to biased numerical generation. To investigate the hypothesis, we first examine whether digits frequencies in pretraining corpus (OLMo2) follows Benford's law. We then construct an evaluation benchmark in which the ground-truth digits are uniformly distributed within each of the seven numerical reasoning tasks. Our evaluation results demonstrate that leading open-source LLMs show a consistent pattern of digit bias that resembles Benford's law. Through logit-lens tracing and neuron-level dissection, we identify that this bias arises predominantly from a small subset of highly digit-selective feed-forward network (FFN) neurons in the deeper layers. Finally, we demonstrate that pruning these neurons mitigates imbalanced overgeneration and partially corrects erroneous outputs, providing causal evidence that fine-grained pretraining digit bias can propagate into model behavior. Our findings reveal a fundamental connection between corpus-level statistics and symbolic failure modes in LLMs, offering a new lens for diagnosing and mitigating hallucinations in numerical tasks.

Paper Structure

This paper contains 39 sections, 2 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: Illustration of digit bias in identifying the last term of a numerical sequence. LLaMA3.1-8B-Instruct is asked to multiply the final term of a sequence by two, but first must identify the last item. In the first case, the model incorrectly selects a smaller intermediate value (1.81) instead of the actual final term (8.26). In the second, it correctly selects 1.05. This asymmetry suggests a bias toward smaller digits when the model is uncertain, revealing a subtle form of numerical hallucination. Detailed accuracy comparisons are shown in Figure \ref{['fig:toyres']}.
  • Figure 2: Accuracy in the sequence last term identification task shown in Figure \ref{['fig:toyexample']}. Digits in the range 1-3 (low-value) are recognized with significantly higher accuracy compared to digits in the range 8-9 (high-value), indicating a generation bias toward smaller digits.
  • Figure 3: Digit bias observed in the Digit Bias Benchmark with Mistral-7B. (a) The boxplot shows the distribution of digits in generated answers across all tasks, revealing a significant overrepresentation of smaller digits despite the benchmark's uniform ground-truth distribution. (b) The distribution of digits at the first error position exhibits an even stronger skew toward smaller values, closely following Benford’s Law. Together, these results suggest that digit bias shapes not just preferences but also the numerical hallucination.
  • Figure 4: (a) The histogram compares the digit distribution predicted by Benford's Law with that of the OlMo-mix-1124 corpus, showing their degree of similarity. (b) The heatmap of digit probabilities across layers obtained via Logit Lens. Smaller digits remain relatively undistinguished in early and middle layers, whereas larger digits show sharper activations earlier. This indicates that the overgeneration of small digits is driven by preferences formed in the final layers.
  • Figure 5: Digit probability trajectories in later layers. Starting from layers where digits 1, 3, and 6 have equal probabilities (layers 17, 22, and 25 respectively), we trace their subsequent evolution. Digit 1 consistently gains probability more rapidly than the others, suggesting that the model amplifies its bias toward smaller digits during final token prediction stages.
  • ...and 7 more figures