Table of Contents
Fetching ...

Understanding the Interplay between Parametric and Contextual Knowledge for Large Language Models

Sitao Cheng, Liangming Pan, Xunjian Yin, Xinyi Wang, William Yang Wang

TL;DR

The paper investigates how large language models balance their internal parametric knowledge (PK) with externally provided contextual knowledge (CK). By introducing EchoQA, a benchmark that spans scientific, factual, and commonsense domains, the authors categorize CK–PK interactions into four types and examine model behavior under progressively enforced reasoning instructions. Across complementary, conflicting, and irrelevant CK scenarios, they find universal PK suppression by CK and only partial recovery of PK leverage through instructions, revealing a reliability vulnerability in knowledge-intensive tasks. The work highlights factors such as knowledge type and entity popularity that modulate PK recall and suggests directions like agent-based recall-before-reasoning and post-training integration to improve PK–CK fusion. Overall, EchoQA provides a valuable testbed for understanding and enhancing the integration of PK and CK in modern LLMs.

Abstract

Large language models (LLMs) encode vast amounts of knowledge during pre-training (parametric knowledge, or PK) and can further be enhanced by incorporating contextual knowledge (CK). Can LLMs effectively integrate their internal PK with external CK to solve complex problems? In this paper, we investigate the dynamic interaction between PK and CK, categorizing their relationships into four types: Supportive, Complementary, Conflicting, and Irrelevant. To support this investigation, we introduce ECHOQA, a benchmark spanning scientific, factual, and commonsense knowledge. Our results show that LLMs tend to suppress their PK when contextual information is available, even when it is complementary or irrelevant. While tailored instructions can encourage LLMs to rely more on their PK, they still struggle to fully leverage it. These findings reveal a key vulnerability in LLMs, raising concerns about their reliability in knowledge-intensive tasks. Resources are available at https://github.com/sitaocheng/Knowledge_Interplay

Understanding the Interplay between Parametric and Contextual Knowledge for Large Language Models

TL;DR

The paper investigates how large language models balance their internal parametric knowledge (PK) with externally provided contextual knowledge (CK). By introducing EchoQA, a benchmark that spans scientific, factual, and commonsense domains, the authors categorize CK–PK interactions into four types and examine model behavior under progressively enforced reasoning instructions. Across complementary, conflicting, and irrelevant CK scenarios, they find universal PK suppression by CK and only partial recovery of PK leverage through instructions, revealing a reliability vulnerability in knowledge-intensive tasks. The work highlights factors such as knowledge type and entity popularity that modulate PK recall and suggests directions like agent-based recall-before-reasoning and post-training integration to improve PK–CK fusion. Overall, EchoQA provides a valuable testbed for understanding and enhancing the integration of PK and CK in modern LLMs.

Abstract

Large language models (LLMs) encode vast amounts of knowledge during pre-training (parametric knowledge, or PK) and can further be enhanced by incorporating contextual knowledge (CK). Can LLMs effectively integrate their internal PK with external CK to solve complex problems? In this paper, we investigate the dynamic interaction between PK and CK, categorizing their relationships into four types: Supportive, Complementary, Conflicting, and Irrelevant. To support this investigation, we introduce ECHOQA, a benchmark spanning scientific, factual, and commonsense knowledge. Our results show that LLMs tend to suppress their PK when contextual information is available, even when it is complementary or irrelevant. While tailored instructions can encourage LLMs to rely more on their PK, they still struggle to fully leverage it. These findings reveal a key vulnerability in LLMs, raising concerns about their reliability in knowledge-intensive tasks. Resources are available at https://github.com/sitaocheng/Knowledge_Interplay

Paper Structure

This paper contains 24 sections, 2 equations, 8 figures, 17 tables.

Figures (8)

  • Figure 1: Our benchmark EchoQA, accessing LLMs ability to echo their parametric knowledge (PK) when contextual knowledge (CK) is present. We firstly question LLMs to obtain PK and discard knowledge they cannot answer. Then, we construct CK by various reasoning types (Table \ref{['tab:type_and_metrics']}). Next, we question LLMs given CK. Exemplar result is by Llama 3.1-70B on ALCUNA yin-etal-2023-alcuna.
  • Figure 2: Accuracy for Complementary Reasoning. w/o Knowledge and Golden Knowledge means no information and all the required information is given, respectively. The upward trend shows that LLMs suppress PK even with complementary CK, comparing to the orange bar.
  • Figure 3: Memorization Ratio for Conflicting Reasoning. LLMs rarely trust themselves (PK) when faced with conflicting CK, though instructions modulate their preference to some extent.
  • Figure 4: Accuracy for Irrelevant Reasoning, showing that LLMs relying on CK even though it is irrelevant and that instructions can substantially modulate their leverage of knowledge.
  • Figure 5: Memorization Ratio on ConflictQA across popularity categories by representative models, showing that LLMs recall their popular knowledge better.
  • ...and 3 more figures