Table of Contents
Fetching ...

Decoding In-Context Learning: Neuroscience-inspired Analysis of Representations in Large Language Models

Safoora Yousefi, Leo Betthauser, Hosein Hasanbeig, Raphaël Millière, Ida Momennejad

TL;DR

This work investigates how in-context learning reshapes latent representations in large language models by employing neuroscience-inspired analysis. Using Representational Similarity Analysis (RSA) and a novel Attention Ratio Analysis (ARA), the authors examine embedding and attention changes in Vicuna-1.3 13B and Llama-2 70B across two tasks: linear regression and reading comprehension. They show that ICL induces task-relevant shifts in embedding similarity and focuses attention on informative prompt content, with these changes correlating with behavioral improvements. The proposed framework provides a scalable, interpretable approach to link latent representations to ICL performance, offering practical tools for model understanding and design in real-world applications.

Abstract

Large language models (LLMs) exhibit remarkable performance improvement through in-context learning (ICL) by leveraging task-specific examples in the input. However, the mechanisms behind this improvement remain elusive. In this work, we investigate how LLM embeddings and attention representations change following in-context-learning, and how these changes mediate improvement in behavior. We employ neuroscience-inspired techniques such as representational similarity analysis (RSA) and propose novel methods for parameterized probing and measuring ratio of attention to relevant vs. irrelevant information in Llama-2 70B and Vicuna 13B. We designed two tasks with a priori relationships among their conditions: linear regression and reading comprehension. We formed hypotheses about expected similarities in task representations and measured hypothesis alignment of LLM representations before and after ICL as well as changes in attention. Our analyses revealed a meaningful correlation between improvements in behavior after ICL and changes in both embeddings and attention weights across LLM layers. This empirical framework empowers a nuanced understanding of how latent representations shape LLM behavior, offering valuable tools and insights for future research and practical applications.

Decoding In-Context Learning: Neuroscience-inspired Analysis of Representations in Large Language Models

TL;DR

This work investigates how in-context learning reshapes latent representations in large language models by employing neuroscience-inspired analysis. Using Representational Similarity Analysis (RSA) and a novel Attention Ratio Analysis (ARA), the authors examine embedding and attention changes in Vicuna-1.3 13B and Llama-2 70B across two tasks: linear regression and reading comprehension. They show that ICL induces task-relevant shifts in embedding similarity and focuses attention on informative prompt content, with these changes correlating with behavioral improvements. The proposed framework provides a scalable, interpretable approach to link latent representations to ICL performance, offering practical tools for model understanding and design in real-world applications.

Abstract

Large language models (LLMs) exhibit remarkable performance improvement through in-context learning (ICL) by leveraging task-specific examples in the input. However, the mechanisms behind this improvement remain elusive. In this work, we investigate how LLM embeddings and attention representations change following in-context-learning, and how these changes mediate improvement in behavior. We employ neuroscience-inspired techniques such as representational similarity analysis (RSA) and propose novel methods for parameterized probing and measuring ratio of attention to relevant vs. irrelevant information in Llama-2 70B and Vicuna 13B. We designed two tasks with a priori relationships among their conditions: linear regression and reading comprehension. We formed hypotheses about expected similarities in task representations and measured hypothesis alignment of LLM representations before and after ICL as well as changes in attention. Our analyses revealed a meaningful correlation between improvements in behavior after ICL and changes in both embeddings and attention weights across LLM layers. This empirical framework empowers a nuanced understanding of how latent representations shape LLM behavior, offering valuable tools and insights for future research and practical applications.
Paper Structure (23 sections, 1 equation, 10 figures)

This paper contains 23 sections, 1 equation, 10 figures.

Figures (10)

  • Figure 1: Experimental tasks and analyses. (a) We designed a linear regression task, in which the LLM is provided with a set of $x$ and $y$ coordinates that fall on the same line, is given a final $x$ and is asked to provide a $y$ so all points fall on the same line. (b) We designed a reading comprehension task with prompts about individuals doing different activities and a question about the activity of one of the individuals. Crucially, the prompts included both informative and distracting subsequences. (c) We constructed a hypothesis matrix about the similarity of different prompts for a given task. Regression: hypothesis was based on the line's slope. Reading comprehension: three hypothesis matrices were constructed based on name, activity, and their combination corresponding to the correct response. We demonstrated the correlation of the alignment between the hypothesis and the embedding similarity matrices with LLM behavior. (d) We computed the ratio of attention to informative components of the prompt before and after ICL and its correlation with LLM behavior.
  • Figure 2: Embedding similarity (M) and hypothesis (H) matrices for the regression task.(a) We constructed a hypothesis similarity matrix assuming prompts about lines that have the same slope would have similar embeddings. (b) and (c) We computed the actual prompt-to-prompt embedding similarity matrix (for a given layer, e.g., the last layer of Llama-2 here) for prompts with no ICL ($M_0$), and after the addition of ICL examples ($M_k$). Each row and column represent the embedding of a regression task prompt. We then computed the alignment of the H and M similarity matrices before and after ICL for multiple layers of each model (Figure \ref{['fig:reg_rsa_results']}).
  • Figure 3: Behavior, hypothesis alignment, and embedding probes for the regression task before and after ICL.(a) Increasing the number of ICL examples decreased the absolute error between model's response $\hat{y_T}$ and the ground truth $y_T$ (see equation \ref{['eq1']}) for Llama-2 and Vicuna-1.3. (b) A logistic regression probing classifier was trained to predict the line slope of regression prompts from the last layer's embedding. Decoding accuracy increased with ICL in both Llama2 and Vicuna. (c) Behavior improvement in both models is correlated with the accuracy of the embedding classifier. The more slope information embedded in the model's representations, the smaller the model's mean absolute error in predicting $y_T$. (d) The correlation between our slope-based hypothesis matrix and the embeddings similarity matrix increases with more ICL examples for both models (visualised for middle layer). (e) and (f) Hypothesis alignment improved consistently with more ICL examples across LLM layers of varying depths with the exception of the first layer.
  • Figure 4: Model behavior and hypothesis alignment before and after ICL for the reading comprehension task. (a): The accuracy of Llama-2 and Vicuna-1.3 behavior in the reading comprehension task significantly benefits from ICL examples. (b) ICL significantly improves the alignment between Vicuna's embedding similarity matrix and three hypothesis matrices, $p< 0.05$: name-based similarity (prompts inquiring about the same individual are more similar), activity-based similarity (prompts whose correct answer includes the same activity are more similar), and the combined name and activity similarity. (c) ICL significantly increases hypothesis alignment of Llama2 embeddings for activities and combined hypothesis matrices, $p< 0.05$.
  • Figure 5: Attention Ratio Analysis (ARA) before and after ICL for the reading comprehension task.. (a) and (b) The ratio of attention to informative and uninformative information were measured for the reading comprehension task before and after ICL (blue and orange respectively) for Llama-2 70B and Vicuna-1.3 13B. Attention ratio distributions concentrated toward larger numbers indicate more attention to the informative part of the prompt. Attention ratios corresponding to the middle layer of both models significantly shift to larger values with the introduction of ICL, indicating more attention to informative information after ICL. (c) and (d) Attention ratios (x axis) in both models are significant indicators of each model's behavioral accuracy (y axis).
  • ...and 5 more figures