When Context Leads but Parametric Memory Follows in Large Language Models

Yufei Tao; Adam Hiatt; Erik Haake; Antonie J. Jetter; Ameeta Agrawal

When Context Leads but Parametric Memory Follows in Large Language Models

Yufei Tao, Adam Hiatt, Erik Haake, Antonie J. Jetter, Ameeta Agrawal

TL;DR

Investigating how nine widely used LLMs allocate knowledge between local context and global parameters when answering open-ended questions in knowledge-consistent scenarios reveals consistent patterns across models, including a consistent reliance on both contextual and parametric knowledge, and a decrease in hallucinations with increasing context.

Abstract

Large language models (LLMs) have demonstrated remarkable progress in leveraging diverse knowledge sources. This study investigates how nine widely used LLMs allocate knowledge between local context and global parameters when answering open-ended questions in knowledge-consistent scenarios. We introduce a novel dataset, WikiAtomic, and systematically vary context sizes to analyze how LLMs prioritize and utilize the provided information and their parametric knowledge in knowledge-consistent scenarios. Additionally, we also study their tendency to hallucinate under varying context sizes. Our findings reveal consistent patterns across models, including a consistent reliance on both contextual (around 70%) and parametric (around 30%) knowledge, and a decrease in hallucinations with increasing context. These insights highlight the importance of more effective context organization and developing models that use input more deterministically for robust performance.

When Context Leads but Parametric Memory Follows in Large Language Models

TL;DR

Abstract

Paper Structure (29 sections, 25 figures, 1 table)

This paper contains 29 sections, 25 figures, 1 table.

Introduction
Task and Terminology
WikiAtomic Dataset
Experiments and Evaluation
Experiments
Evaluation
Models
Results and Analysis
In knowledge-consistent setting, how do models prioritize sources of knowledge?
Which parts of context are used?
How similar are various types of knowledge?
In knowledge-consistent setting, (how much) do models hallucinate?
Further Analyses
Discussion
Related Work
...and 14 more sections

Figures (25)

Figure 1: Overview of the dataset creation and model evaluation pipeline
Figure 2: An example of Context, Question, Model Response (GPT-4o) and the list of Atomic Response mapped to contextual knowledge and parametric knowledge
Figure 3: Knowledge-consistency between parametric knowledge and input context of WikiAtomic topics, computed using SBERT reimers2019sentencebert
Figure 4: Contextual (local), parametric (global), and total sentences in responses for (a) GPT-4o, (b) Claude Opus, (c) Llama 3 70B, and (d) Mistral 8x22B. On the $x$-axis, $k=0$ serves as the baseline when no context is provided.
Figure 5: Percentage of each quartile of context recalled in response (GPT-4o)
...and 20 more figures

When Context Leads but Parametric Memory Follows in Large Language Models

TL;DR

Abstract

When Context Leads but Parametric Memory Follows in Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (25)