Table of Contents
Fetching ...

Information Anxiety in Large Language Models

Prasoon Bajpai, Sarah Masud, Tanmoy Chakraborty

TL;DR

This work focuses on three critical dimensions - the impact of entity popularity, the models' sensitivity to lexical variations in query formulation, and the progression of hidden state representations across LLM layers.

Abstract

Large Language Models (LLMs) have demonstrated strong performance as knowledge repositories, enabling models to understand user queries and generate accurate and context-aware responses. Extensive evaluation setups have corroborated the positive correlation between the retrieval capability of LLMs and the frequency of entities in their pretraining corpus. We take the investigation further by conducting a comprehensive analysis of the internal reasoning and retrieval mechanisms of LLMs. Our work focuses on three critical dimensions - the impact of entity popularity, the models' sensitivity to lexical variations in query formulation, and the progression of hidden state representations across LLM layers. Our preliminary findings reveal that popular questions facilitate early convergence of internal states toward the correct answer. However, as the popularity of a query increases, retrieved attributes across lexical variations become increasingly dissimilar and less accurate. Interestingly, we find that LLMs struggle to disentangle facts, grounded in distinct relations, from their parametric memory when dealing with highly popular subjects. Through a case study, we explore these latent strains within LLMs when processing highly popular queries, a phenomenon we term information anxiety. The emergence of information anxiety in LLMs underscores the adversarial injection in the form of linguistic variations and calls for a more holistic evaluation of frequently occurring entities.

Information Anxiety in Large Language Models

TL;DR

This work focuses on three critical dimensions - the impact of entity popularity, the models' sensitivity to lexical variations in query formulation, and the progression of hidden state representations across LLM layers.

Abstract

Large Language Models (LLMs) have demonstrated strong performance as knowledge repositories, enabling models to understand user queries and generate accurate and context-aware responses. Extensive evaluation setups have corroborated the positive correlation between the retrieval capability of LLMs and the frequency of entities in their pretraining corpus. We take the investigation further by conducting a comprehensive analysis of the internal reasoning and retrieval mechanisms of LLMs. Our work focuses on three critical dimensions - the impact of entity popularity, the models' sensitivity to lexical variations in query formulation, and the progression of hidden state representations across LLM layers. Our preliminary findings reveal that popular questions facilitate early convergence of internal states toward the correct answer. However, as the popularity of a query increases, retrieved attributes across lexical variations become increasingly dissimilar and less accurate. Interestingly, we find that LLMs struggle to disentangle facts, grounded in distinct relations, from their parametric memory when dealing with highly popular subjects. Through a case study, we explore these latent strains within LLMs when processing highly popular queries, a phenomenon we term information anxiety. The emergence of information anxiety in LLMs underscores the adversarial injection in the form of linguistic variations and calls for a more holistic evaluation of frequently occurring entities.

Paper Structure

This paper contains 29 sections, 16 equations, 6 figures.

Figures (6)

  • Figure 1: Overview of our findings. The figure demonstrates our proposed experimental setup, where we prompt models from the Llama-2 family with factual questions from PopQA in an in-context learning setting. We intercept the hidden states after each decoder block and invert it to obtain the next token probability distribution. We find that for questions based on highly popular entities, predicted probability distributions converge rapidly to the target probability distributions. (a) The figure demonstrates the difference in the attention scores given to a set of specific tokens based on the popularity of questions. There is a higher attention score provided, in case of lower popularity questions. (b) The figure demonstrates the similarity of the facts retrieved across different lexical variations of the same question, depending on the questions’s popularity. A higher dissimilarity for the lexical variations of the same question, is observed given the popularity of the question is high. Our findings expose critical internal strains in LLMs while addressing highly prevalent information, a phenomena we term as information anxiety.
  • Figure 2: Variety in responses across different scales of popularity. For each question, we capture the variety in responses by calculating the F1 scores of answers across the different lexical variants. We can see, for the 'Screenwriter’ relation across four models, that there is an increase in the variety of responses with an increase in the popularity of questions, suggesting the sensitivity of responses to the lexical structure of queries based on highly popular entities. The figure shows results for following models - (a) Llama-2-7B, (b) Llama-2-7B-chat, (c) Llama-2-13B, and (d) Llama-2-13B-chat.
  • Figure 3: Convergence of the predicted probability distribution. The figure shows the KL-Divergence between the next token predicted probability distribution and the golden probability distribution for questions of capital and genre relations in PopQA. For each batch of questions, the solid line denotes the average of the distances between the predicted and the golden distributions across all questions in a batch (Head, Torso or Tail). We also account for the effect of lexical variations on the next token probability distribution. The shaded regions denote the average impact of such variations (captured using deviations in the KL-Divergence distances) across all questions in a batch. We observe the predicted probability distribution associated with highly popular questions approaches the golden probability distribution earlier as compared to that associated with comparatively less popular questions. The figure shows for following models and relation combination --- (a) Llama-2-7B + capital, (b) Llama-2-7B-chat + capital, (c) Llama-2-7B + genre, and (d) Llama-2-7B-chat + genre. All remaining results can be found in Supplementary, Section 2.1.
  • Figure 4: Effect of popularity on LLM reasoning behaviour. For each lexical variant of a question, we calculate a measure of attention scores over each constrain token. For each question, we take a mean of these attention scores across all its lexical variants and show it with solid lines for each layer under study. We also take the standard deviation across all of the lexical variants of a question and show it using error bars for each layer. Head denotes the batch of highly popular questions while Tail denotes the batch of less popular questions. The attention score, as well as its sensitivity to lexical variation (observed from the standard deviation), appears to be highest in the middle layers of processing. Furthermore, we observe that between Head and Tail, a lower attention score is associated with the batch of highly popular questions. The figure shows the results for following model + relation combination --- (a) Llama-2-7B + Capital, (b) Llama-2-7B + Occupation, (c) Llama-2-7B-chat + Capital, (d) Llama-2-7B-chat + Occupation. All results can be found in Supplementary, Section 2.2.
  • Figure 5: Effect of popularity on LLM fact-retrieval behaviour. (a) and (b) demonstrate the similarity between the facts retrieved corresponding to different lexical variations of each question. We observe a lower similarity between the facts retrieved across different lexical variations of a prevalent question. (c) and (d) add an extra axis showing the batch-wise representation of the target token probability score finally sampled from the predicted probability distribution at each layer. While there is an inconsistency in the facts retrieved, LLMs stay confident in their prediction despite sensitivity towards lexical variations of highly popular questions. The figure shows results for following models -- (a), (c) Llama-2-7B + Capital, (b), (d) Llama-2-13B + Capital. Results for the other LLMs can be found in $Supplementary$, Section 2.3.
  • ...and 1 more figures