Table of Contents
Fetching ...

Distributional reasoning in LLMs: Parallel reasoning processes in multi-hop reasoning

Yuval Shalev, Amir Feder, Ariel Goldstein

TL;DR

This work investigates whether large language models perform distributional, parallel multi-hop reasoning rather than a single-step inference. It introduces a two-stage, linear-approximation framework in which middle-layer intermediate-answer activations (A1) linearly predict final-answer activations (A2) via a subject-invariant matrix Q2, and demonstrates interpretable, phase-transition dynamics in hidden embeddings. Using the Compositional Celebrities dataset and a Hallucinations variant, the authors show that after two-thirds depth the A1-to-A2 relationship is strong (mean R^2 > 0.5), that intermediate activations are interpretable and aligned with final outputs, and that the same reasoning process generalizes to fictitious or out-of-distribution subjects. These findings suggest a cognitive-inspired blend of association and propositional reasoning in LLMs and provide a framework for probing internal thought processes. The results advance cognitive modeling of AI by linking activations to parallel reasoning paths and offering interpretable diagnostics for reasoning strategies.

Abstract

Large language models (LLMs) have shown an impressive ability to perform tasks believed to require thought processes. When the model does not document an explicit thought process, it becomes difficult to understand the processes occurring within its hidden layers and to determine if these processes can be referred to as reasoning. We introduce a novel and interpretable analysis of internal multi-hop reasoning processes in LLMs. We demonstrate that the prediction process for compositional reasoning questions can be modeled using a simple linear transformation between two semantic category spaces. We show that during inference, the middle layers of the network generate highly interpretable embeddings that represent a set of potential intermediate answers for the multi-hop question. We use statistical analyses to show that a corresponding subset of tokens is activated in the model's output, implying the existence of parallel reasoning paths. These observations hold true even when the model lacks the necessary knowledge to solve the task. Our findings can help uncover the strategies that LLMs use to solve reasoning tasks, offering insights into the types of thought processes that can emerge from artificial intelligence. Finally, we also discuss the implication of cognitive modeling of these results.

Distributional reasoning in LLMs: Parallel reasoning processes in multi-hop reasoning

TL;DR

This work investigates whether large language models perform distributional, parallel multi-hop reasoning rather than a single-step inference. It introduces a two-stage, linear-approximation framework in which middle-layer intermediate-answer activations (A1) linearly predict final-answer activations (A2) via a subject-invariant matrix Q2, and demonstrates interpretable, phase-transition dynamics in hidden embeddings. Using the Compositional Celebrities dataset and a Hallucinations variant, the authors show that after two-thirds depth the A1-to-A2 relationship is strong (mean R^2 > 0.5), that intermediate activations are interpretable and aligned with final outputs, and that the same reasoning process generalizes to fictitious or out-of-distribution subjects. These findings suggest a cognitive-inspired blend of association and propositional reasoning in LLMs and provide a framework for probing internal thought processes. The results advance cognitive modeling of AI by linking activations to parallel reasoning paths and offering interpretable diagnostics for reasoning strategies.

Abstract

Large language models (LLMs) have shown an impressive ability to perform tasks believed to require thought processes. When the model does not document an explicit thought process, it becomes difficult to understand the processes occurring within its hidden layers and to determine if these processes can be referred to as reasoning. We introduce a novel and interpretable analysis of internal multi-hop reasoning processes in LLMs. We demonstrate that the prediction process for compositional reasoning questions can be modeled using a simple linear transformation between two semantic category spaces. We show that during inference, the middle layers of the network generate highly interpretable embeddings that represent a set of potential intermediate answers for the multi-hop question. We use statistical analyses to show that a corresponding subset of tokens is activated in the model's output, implying the existence of parallel reasoning paths. These observations hold true even when the model lacks the necessary knowledge to solve the task. Our findings can help uncover the strategies that LLMs use to solve reasoning tasks, offering insights into the types of thought processes that can emerge from artificial intelligence. Finally, we also discuss the implication of cognitive modeling of these results.
Paper Structure (21 sections, 5 equations, 9 figures, 4 tables)

This paper contains 21 sections, 5 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Illustration of possible strategies to answer the question: What is the first letter of the name of the color of a common banana?: (a) The extraction of the color attribute creates a bridge entity from which the second attribute will be extracted; (b) Only a single extraction of the specific attribute, first letter of the name of the color, is performed; (c) The words banana, color, and letter are statistically related to the output y; (d) The extraction of the color attribute results in a distribution of bridge entities. From these entities, the second attribute will be extracted.
  • Figure 2: An example of distributional reasoning in Llama-2-13B using the prompt "What is the first letter of the name of the color of a common banana? The first letter is ". We projected the embeddings from the hidden layers into the vocabulary space and analyzed the activation pattern of the intermediate and final answers. (a) The dashed lines represents activations of intermediate answers $\vec{A1}$ (color names), while the solid lines represent the activations of final answers $\vec{A2}$ (letters) by layer. A phase transition in the activation patterns is observed around layer 30. (b) Activations of intermediate answers $\vec{A1}$(colors) extracted from layer 25 (x-axis), compared to activations of final answers $\vec{A2}$ (letters) extracted from the last layer (y-axis).
  • Figure 3: Tokens of the intermediate answer $\vec{A1}$ can approximate the tokens of the final answers $\vec{A2}$ using a linear transformation. We fitted regression models using k-fold (k=5) method to predict $\vec{A2}$ from $\vec{A1}$. Results using Llama-2-13B: (a) Our model predictions for question type "callingcode". This model predicts the the activation of possible first digits (1-9) using the activation of 117 countries from layer 25. x-axis - $\vec{A2}$ predicted activations; y-axis - real $\vec{A2}$ activations. Each color represent another digit (mean $R^2=0.86$). (b) Mean $R^2$ (with error bars denoting standard deviations normalized by the squared root of the group size) of our model across 14 question types, calculated for each layer separately. In blue - mean $R^2$ of the models using the logits of $\vec{A1}$ as predictors. In orange - mean $R^2$ of the models using the logits of $\vec{A2}$ as predictors. On average, the intermediate category $\vec{A1}$ was more informative about the final answers.
  • Figure 4: There is a high correlation between the activation patterns of $\vec{A1}$ and $\vec{A2}$. Results of Llama-2-13B on the entire dataset: (a) The embeddings from the middle layers primarily represent $\vec{A1}$ (dashed lines). Then, a phase transition occurs, and the embeddings from the final layers primarily represent the $\vec{A2}$ logits (solid lines). The colors indicate pairs of intermediate answers (country names), and their corresponding correct final answers (e.g., capitals). (b) Both categories are sorted identically: The x-axis displays $\vec{A1}$ activations from layer 25, while the y-axis shows $\vec{A2}$ activations from the final layer. (c) Mean spearman correlations (with error bars denoting standard deviations normalized by the squared root of the group size) across 14 question types by model depth.
  • Figure 5: The reasoning process is dissociated of the model's training data. Our linear models generalize to prompts about fictitious subjects, indicating that the same reasoning process occurs within the model, regardless of the subject. We used the Ridge regularization method to fit linear models on the original dataset. We then tested these models on modified questions about fictitious celebrity names. Results using Llama-2-13B: (a) Our model generalization results (layer 25) on question type “callingcode” (mean $R^2=0.61$). (b) Mean $R^2$ (with error bars denoting standard deviations normalized by the squared root of the group size) of the fictitious subjects experiments across 14 question types, calculated for each layer separately.
  • ...and 4 more figures