Table of Contents
Fetching ...

Deciphering the Interplay of Parametric and Non-parametric Memory in Retrieval-augmented Language Models

Mehrdad Farahani, Richard Johansson

TL;DR

This study uses causal mediation analysis and controlled experiments to examine how internal representations influence information processing in the Atlas approach, a RAG model, and indicates that in cases where the model can choose between both types of information, it relies more on the context than the parametric knowledge.

Abstract

Generative language models often struggle with specialized or less-discussed knowledge. A potential solution is found in Retrieval-Augmented Generation (RAG) models which act like retrieving information before generating responses. In this study, we explore how the \textsc{Atlas} approach, a RAG model, decides between what it already knows (parametric) and what it retrieves (non-parametric). We use causal mediation analysis and controlled experiments to examine how internal representations influence information processing. Our findings disentangle the effects of parametric knowledge and the retrieved context. They indicate that in cases where the model can choose between both types of information (parametric and non-parametric), it relies more on the context than the parametric knowledge. Furthermore, the analysis investigates the computations involved in \emph{how} the model uses the information from the context. We find that multiple mechanisms are active within the model and can be detected with mediation analysis: first, the decision of \emph{whether the context is relevant}, and second, how the encoder computes output representations to support copying when relevant.

Deciphering the Interplay of Parametric and Non-parametric Memory in Retrieval-augmented Language Models

TL;DR

This study uses causal mediation analysis and controlled experiments to examine how internal representations influence information processing in the Atlas approach, a RAG model, and indicates that in cases where the model can choose between both types of information, it relies more on the context than the parametric knowledge.

Abstract

Generative language models often struggle with specialized or less-discussed knowledge. A potential solution is found in Retrieval-Augmented Generation (RAG) models which act like retrieving information before generating responses. In this study, we explore how the \textsc{Atlas} approach, a RAG model, decides between what it already knows (parametric) and what it retrieves (non-parametric). We use causal mediation analysis and controlled experiments to examine how internal representations influence information processing. Our findings disentangle the effects of parametric knowledge and the retrieved context. They indicate that in cases where the model can choose between both types of information (parametric and non-parametric), it relies more on the context than the parametric knowledge. Furthermore, the analysis investigates the computations involved in \emph{how} the model uses the information from the context. We find that multiple mechanisms are active within the model and can be detected with mediation analysis: first, the decision of \emph{whether the context is relevant}, and second, how the encoder computes output representations to support copying when relevant.
Paper Structure (20 sections, 3 equations, 6 figures, 2 tables)

This paper contains 20 sections, 3 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: The first row represents the first experiment, while the second row represents the second experiment for the subjects in relation to the "restoration run". (a) shows the representations with injected counterfactual embeddings for the query "What is the capital of Iran?" (b) depicts how restoration occurs at token $i$ and layer $l$. Moving on to the second experiment, which is very similar to meng_2022: (c) shows the representations when we replace the object tokens ("Tehran") with a counterfactual ("Rome"). (d) demonstrates how restoration occurs after adding noise to the subject tokens ("Iran"). We have a similar implementation for the second experiment for relations.
  • Figure 2: The figures demonstrate the AIE results of the copying behavior in Atlas across different modules and layers. (a -- c) represent the AIEs of hidden states ($h^{(l)}$), MLP, and Attention modules over the whole data points, which show that the object tokens are the dominant component in copying behavior. (d -- i) similarly, show the AIEs for the second experiment on subject and relations tokens respectively, highlighting the vital role of these two components in determining context relevancy.
  • Figure 3: The left side illustrates the TE distribution across parametric and non-parametric behaviors, while the right side shows the overall distribution. The dominant distribution, represented in orange, indicates that the model's responses shift towards counterfactuals when the contexts are altered. Similarly, it reflects the model's general tendency to rely on the context to extract the answer (essentially, copying from the context).
  • Figure 4: These figures illustrate the impact of MLP and Attention on both earlier experiments. We consider the average impact over all the subject, object, and relation tokens as set tokens. (a--c) show the contribution of MLP blocks from the early to the middle layers are key contributors to the model's ability to translate object token representations from the encoder space to the decoder while the Attention plays a minor role in the later layers. (d -- i) depict the contribution of both model components from the early to the later layers, aligning with the processing of context relevance and the extraction of object tokens.
  • Figure 5: This plot shows the TE distribution across subjects and relation tokens.
  • ...and 1 more figures