Quantifying In-Context Reasoning Effects and Memorization Effects in LLMs
Siyu Lou, Yuntian Chen, Xiaodan Liang, Liang Lin, Quanshi Zhang
TL;DR
The paper tackles the challenge of disentangling memorization and in-context reasoning in large language models by introducing an axiomatic framework that decomposes the LLM confidence score $v(x_{n+1}|\mathbf{x})$ into AND/OR interactions. It formalizes context-agnostic memorization as foundational and chaotic components and in-context reasoning as enhanced, eliminated, or reversed components, all constrained by two axioms (condition dependence and variable independence) and supported by sparsity and universal-matching properties. The authors prove that the decomposed effects faithfully reconstruct the model output and provide a practical protocol to quantify these effects via logically equivalent prompts, enabling fine-grained analyses across multiple models (OPT-1.3B, LLaMA-7B, GPT-3.5-Turbo). Empirical results show substantial in-context reasoning effects with relatively weak chaotic memorization, and reveal that premises mainly suppress high-order memorization interactions while slightly modifying low-order ones, supporting semantic debugging and adversarial prompt design. Overall, the framework offers interpretable, faithful insights into LLM inference and practical avenues for debugging and robust prompt engineering.
Abstract
In this study, we propose an axiomatic system to define and quantify the precise memorization and in-context reasoning effects used by the large language model (LLM) for language generation. These effects are formulated as non-linear interactions between tokens/words encoded by the LLM. Specifically, the axiomatic system enables us to categorize the memorization effects into foundational memorization effects and chaotic memorization effects, and further classify in-context reasoning effects into enhanced inference patterns, eliminated inference patterns, and reversed inference patterns. Besides, the decomposed effects satisfy the sparsity property and the universal matching property, which mathematically guarantee that the LLM's confidence score can be faithfully decomposed into the memorization effects and in-context reasoning effects. Experiments show that the clear disentanglement of memorization effects and in-context reasoning effects enables a straightforward examination of detailed inference patterns encoded by LLMs.
