Evaluating the External and Parametric Knowledge Fusion of Large Language Models
Hao Zhang, Yuyang Zhang, Xiaoguang Li, Wenxuan Shi, Haonan Xu, Huanshuo Liu, Yasheng Wang, Lifeng Shang, Qun Liu, Yong Liu, Ruiming Tang
TL;DR
This work systematically investigates how large language models fuse external knowledge $K_e$ with parametric knowledge $K_p$ across four scenarios, introducing a data-construction pipeline and a knowledge-infusion protocol to enable controlled evaluation. By amassing electronics-domain data and partitioning it into external and parametric components, the authors quantify how continued training and supervised fine-tuning affect knowledge retention, elicitation, and boundary perception, revealing persistent challenges such as noise sensitivity and incomplete memory. Across backbones including GPT-4, ChatGLM, and Qwen, results show that parametric knowledge infusion can significantly aid fusion, especially when external information is partial or unhelpful, but performance never reaches ideal levels due to model capacity and dataset diversity constraints. The findings emphasize the need for robust strategies to filter noise, improve parametric memory usage, and enable reliable refusals when questions are unanswerable, thereby guiding future work on harmonizing external and parametric knowledge in LLMs with practical impact for retrieval-augmented systems and knowledge-intensive tasks.
Abstract
Integrating external knowledge into large language models (LLMs) presents a promising solution to overcome the limitations imposed by their antiquated and static parametric memory. Prior studies, however, have tended to over-reliance on external knowledge, underestimating the valuable contributions of an LLMs' intrinsic parametric knowledge. The efficacy of LLMs in blending external and parametric knowledge remains largely unexplored, especially in cases where external knowledge is incomplete and necessitates supplementation by their parametric knowledge. We propose to deconstruct knowledge fusion into four distinct scenarios, offering the first thorough investigation of LLM behavior across each. We develop a systematic pipeline for data construction and knowledge infusion to simulate these fusion scenarios, facilitating a series of controlled experiments. Our investigation reveals that enhancing parametric knowledge within LLMs can significantly bolster their capability for knowledge integration. Nonetheless, we identify persistent challenges in memorizing and eliciting parametric knowledge, and determining parametric knowledge boundaries. Our findings aim to steer future explorations on harmonizing external and parametric knowledge within LLMs.
