States Hidden in Hidden States: LLMs Emerge Discrete State Representations Implicitly
Junhao Chen, Shengding Hu, Zhiyuan Liu, Maosong Sun
TL;DR
This work tackles how LLMs can implicitly perform extended arithmetic without step-by-step reasoning by hypothesizing Implicit Discrete State Representations (IDSRs) hidden in model states. Using a synthetic dataset of consecutive additions and probing with multi-layer perceptrons, the authors demonstrate IDSR existence, characterize their digit-, sequence-, and layer-level formation, and show that IDSRs are employed to deliver final results. They reveal a dual-layer regime: the first ten layers (shallow-semantic) generate arithmetic content with high fidelity, while later layers (semantic) re-encode information under task context, enabling multi-hop calculations via an attention-bridge mechanism. The findings advance interpretability by showing how mid- to long-range computations may be anchored in discrete internal states, though current open-source models still lose some information across steps, offering avenues to improve reliability and scalability of implicit computation in LLMs.
Abstract
Large Language Models (LLMs) exhibit various emergent abilities. Among these abilities, some might reveal the internal working mechanisms of models. In this paper, we uncover a novel emergent capability in models: the intrinsic ability to perform extended sequences of calculations without relying on chain-of-thought step-by-step solutions. Remarkably, the most advanced models can directly output the results of two-digit number additions with lengths extending up to 15 addends. We hypothesize that the model emerges Implicit Discrete State Representations (IDSRs) within its hidden states and performs symbolic calculations internally. To test this hypothesis, we design a sequence of experiments that look into the hidden states. Specifically, we first confirm that IDSRs exist. Then, we provide interesting observations about the formation of IDSRs from layer, digit, and sequence perspectives. Finally, we confirm that models indeed use IDSRs to produce the final answers. However, we also discover that these state representations are far from lossless in current open-sourced models, leading to inaccuracies in their final performance. Our work presents a novel exploration of LLMs' symbolic calculation abilities and the underlying mechanisms.
