Table of Contents
Fetching ...

Probing Language Models on Their Knowledge Source

Zineddine Tighidet, Andrea Mogini, Jiali Mei, Benjamin Piwowarski, Patrick Gallinari

TL;DR

A novel probing framework is proposed to explore the mechanisms governing the selection between PK and CK in LLMs and demonstrates that mid-layer activations are crucial in predicting knowledge source selection, paving the way for more reliable models capable of handling knowledge conflicts effectively.

Abstract

Large Language Models (LLMs) often encounter conflicts between their learned, internal (parametric knowledge, PK) and external knowledge provided during inference (contextual knowledge, CK). Understanding how LLMs models prioritize one knowledge source over the other remains a challenge. In this paper, we propose a novel probing framework to explore the mechanisms governing the selection between PK and CK in LLMs. Using controlled prompts designed to contradict the model's PK, we demonstrate that specific model activations are indicative of the knowledge source employed. We evaluate this framework on various LLMs of different sizes and demonstrate that mid-layer activations, particularly those related to relations in the input, are crucial in predicting knowledge source selection, paving the way for more reliable models capable of handling knowledge conflicts effectively.

Probing Language Models on Their Knowledge Source

TL;DR

A novel probing framework is proposed to explore the mechanisms governing the selection between PK and CK in LLMs and demonstrates that mid-layer activations are crucial in predicting knowledge source selection, paving the way for more reliable models capable of handling knowledge conflicts effectively.

Abstract

Large Language Models (LLMs) often encounter conflicts between their learned, internal (parametric knowledge, PK) and external knowledge provided during inference (contextual knowledge, CK). Understanding how LLMs models prioritize one knowledge source over the other remains a challenge. In this paper, we propose a novel probing framework to explore the mechanisms governing the selection between PK and CK in LLMs. Using controlled prompts designed to contradict the model's PK, we demonstrate that specific model activations are indicative of the knowledge source employed. We evaluate this framework on various LLMs of different sizes and demonstrate that mid-layer activations, particularly those related to relations in the input, are crucial in predicting knowledge source selection, paving the way for more reliable models capable of handling knowledge conflicts effectively.
Paper Structure (24 sections, 9 equations, 7 figures, 4 tables)

This paper contains 24 sections, 9 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Illustration of our method for probing knowledge sources in LLMs. We present the model with a prompt containing contradictory information to its learned knowledge to test whether it uses parametric knowledge (PK) or contextual knowledge (CK). The resulting activations are used to train a classifier to distinguish between PK and CK.
  • Figure 2: Example of the template used to generate the parametric knowledge dataset. The blue text is proper to the relation and the orange is specific to a subject-relation example in the ParaRel dataset elazar2021measuring.
  • Figure 3: Example of 3 counter-knowledge objects that were associated to a parametric knowledge element. The probability distribution is ranked in an descendant order and we selected the objects with the lowerst probabilities.
  • Figure 4: Count of used knowledge sources by each model (CK, PK, and ND). ND refers to outputs where the knowledge source is not defined.
  • Figure 5: Performance of the linear classifier in identifying knowledge sources across different layers and modules (MLP-L2, MLP-L1, MHSA). The plots show success rates for classifiers trained on activations from object, subject, and relation tokens, with the first token used as a control (see Section \ref{['subsec:control']} for more details on the control experiment.) Results are reported for the Mistral-7B, Phi-1.5, Llama3-8B, and Pythia-1.4B models. Solid lines represent the average success rates across relation groups, while shaded areas denote the weighted standard error with a 95% confidence interval. See Section \ref{['sec:eval']} for further details on the evaluation methodology.
  • ...and 2 more figures