Evaluating the External and Parametric Knowledge Fusion of Large Language Models

Hao Zhang; Yuyang Zhang; Xiaoguang Li; Wenxuan Shi; Haonan Xu; Huanshuo Liu; Yasheng Wang; Lifeng Shang; Qun Liu; Yong Liu; Ruiming Tang

Evaluating the External and Parametric Knowledge Fusion of Large Language Models

Hao Zhang, Yuyang Zhang, Xiaoguang Li, Wenxuan Shi, Haonan Xu, Huanshuo Liu, Yasheng Wang, Lifeng Shang, Qun Liu, Yong Liu, Ruiming Tang

TL;DR

This work systematically investigates how large language models fuse external knowledge $K_e$ with parametric knowledge $K_p$ across four scenarios, introducing a data-construction pipeline and a knowledge-infusion protocol to enable controlled evaluation. By amassing electronics-domain data and partitioning it into external and parametric components, the authors quantify how continued training and supervised fine-tuning affect knowledge retention, elicitation, and boundary perception, revealing persistent challenges such as noise sensitivity and incomplete memory. Across backbones including GPT-4, ChatGLM, and Qwen, results show that parametric knowledge infusion can significantly aid fusion, especially when external information is partial or unhelpful, but performance never reaches ideal levels due to model capacity and dataset diversity constraints. The findings emphasize the need for robust strategies to filter noise, improve parametric memory usage, and enable reliable refusals when questions are unanswerable, thereby guiding future work on harmonizing external and parametric knowledge in LLMs with practical impact for retrieval-augmented systems and knowledge-intensive tasks.

Abstract

Integrating external knowledge into large language models (LLMs) presents a promising solution to overcome the limitations imposed by their antiquated and static parametric memory. Prior studies, however, have tended to over-reliance on external knowledge, underestimating the valuable contributions of an LLMs' intrinsic parametric knowledge. The efficacy of LLMs in blending external and parametric knowledge remains largely unexplored, especially in cases where external knowledge is incomplete and necessitates supplementation by their parametric knowledge. We propose to deconstruct knowledge fusion into four distinct scenarios, offering the first thorough investigation of LLM behavior across each. We develop a systematic pipeline for data construction and knowledge infusion to simulate these fusion scenarios, facilitating a series of controlled experiments. Our investigation reveals that enhancing parametric knowledge within LLMs can significantly bolster their capability for knowledge integration. Nonetheless, we identify persistent challenges in memorizing and eliciting parametric knowledge, and determining parametric knowledge boundaries. Our findings aim to steer future explorations on harmonizing external and parametric knowledge within LLMs.

Evaluating the External and Parametric Knowledge Fusion of Large Language Models

TL;DR

This work systematically investigates how large language models fuse external knowledge

with parametric knowledge

across four scenarios, introducing a data-construction pipeline and a knowledge-infusion protocol to enable controlled evaluation. By amassing electronics-domain data and partitioning it into external and parametric components, the authors quantify how continued training and supervised fine-tuning affect knowledge retention, elicitation, and boundary perception, revealing persistent challenges such as noise sensitivity and incomplete memory. Across backbones including GPT-4, ChatGLM, and Qwen, results show that parametric knowledge infusion can significantly aid fusion, especially when external information is partial or unhelpful, but performance never reaches ideal levels due to model capacity and dataset diversity constraints. The findings emphasize the need for robust strategies to filter noise, improve parametric memory usage, and enable reliable refusals when questions are unanswerable, thereby guiding future work on harmonizing external and parametric knowledge in LLMs with practical impact for retrieval-augmented systems and knowledge-intensive tasks.

Abstract

Paper Structure (23 sections, 3 equations, 3 figures, 3 tables)

This paper contains 23 sections, 3 equations, 3 figures, 3 tables.

Introduction
Related Work
Retrieval-augmented LLMs (RA-LLM).
Parametric Knowledge in LLMs.
Knowledge Fusion of LLMs.
Task Definition
Dataset Construction
Data Source Preparation.
Dataset Construction.
Dataset Analysis.
Experiment Setup
Backbone Model.
Parametric Knowledge Infusion.
Evaluation Metrics.
Experiment Results and Analysis
...and 8 more sections

Figures (3)

Figure 1: An illustration of four parametric and external knowledge fusion scenarios in LLMs.
Figure 2: The overview of dataset construction. We first retrieve documents of electronics from websites. The documents are split into two portions based on their released date, and decomposed into paragraphs. The QA pairs for each scenario are generated via prompting LLMs, and the corresponding external evidence and noise are added as support sources. The outdated data is injected into LLMs through pre-training or fine-tuning.
Figure 3: The distribution of the number of the associated evidence per QA sample.

Evaluating the External and Parametric Knowledge Fusion of Large Language Models

TL;DR

Abstract

Evaluating the External and Parametric Knowledge Fusion of Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (3)