Decoding In-Context Learning: Neuroscience-inspired Analysis of Representations in Large Language Models
Safoora Yousefi, Leo Betthauser, Hosein Hasanbeig, Raphaël Millière, Ida Momennejad
TL;DR
This work investigates how in-context learning reshapes latent representations in large language models by employing neuroscience-inspired analysis. Using Representational Similarity Analysis (RSA) and a novel Attention Ratio Analysis (ARA), the authors examine embedding and attention changes in Vicuna-1.3 13B and Llama-2 70B across two tasks: linear regression and reading comprehension. They show that ICL induces task-relevant shifts in embedding similarity and focuses attention on informative prompt content, with these changes correlating with behavioral improvements. The proposed framework provides a scalable, interpretable approach to link latent representations to ICL performance, offering practical tools for model understanding and design in real-world applications.
Abstract
Large language models (LLMs) exhibit remarkable performance improvement through in-context learning (ICL) by leveraging task-specific examples in the input. However, the mechanisms behind this improvement remain elusive. In this work, we investigate how LLM embeddings and attention representations change following in-context-learning, and how these changes mediate improvement in behavior. We employ neuroscience-inspired techniques such as representational similarity analysis (RSA) and propose novel methods for parameterized probing and measuring ratio of attention to relevant vs. irrelevant information in Llama-2 70B and Vicuna 13B. We designed two tasks with a priori relationships among their conditions: linear regression and reading comprehension. We formed hypotheses about expected similarities in task representations and measured hypothesis alignment of LLM representations before and after ICL as well as changes in attention. Our analyses revealed a meaningful correlation between improvements in behavior after ICL and changes in both embeddings and attention weights across LLM layers. This empirical framework empowers a nuanced understanding of how latent representations shape LLM behavior, offering valuable tools and insights for future research and practical applications.
