Table of Contents
Fetching ...

Untangle the KNOT: Interweaving Conflicting Knowledge and Reasoning Skills in Large Language Models

Yantao Liu, Zijun Yao, Xin Lv, Yuchen Fan, Shulin Cao, Jifan Yu, Lei Hou, Juanzi Li

TL;DR

Knot introduces a knowledge conflict dataset to probe how large language models reconcile external conflicting knowledge with their parametric memory across three reasoning modes: direct extraction (Knot-S), explicit multi-hop reasoning (Knot-E), and implicit multi-hop reasoning (Knot-I). The authors develop a saliency-guided data construction pipeline (ego networks, TransE sampling, data-to-text generation) and human-annotated rationales to create high-quality training and evaluation material. A broad evaluation across pre-trained and assistant LLMs reveals that models perform well on Knot-S but struggle with Knot-E and Knot-I; prompting alone offers limited gains, while fine-tuning on Knot data yields substantial improvements, especially for smaller models. The work also analyzes how model size and reasoning type influence reliance on parametric memory, providing empirically grounded guidelines for selecting strategies (prompting, decoding, or fine-tuning) to resolve knowledge conflicts in complex scenarios. Overall, Knot advances understanding of knowledge-conflict resolution in LLMs and offers practical directions for improving inference when external knowledge conflicts with memory.

Abstract

Providing knowledge documents for large language models (LLMs) has emerged as a promising solution to update the static knowledge inherent in their parameters. However, knowledge in the document may conflict with the memory of LLMs due to outdated or incorrect knowledge in the LLMs' parameters. This leads to the necessity of examining the capability of LLMs to assimilate supplemental external knowledge that conflicts with their memory. While previous studies have explained to what extent LLMs extract conflicting knowledge from the provided text, they neglect the necessity to reason with conflicting knowledge. Furthermore, there lack a detailed analysis on strategies to enable LLMs to resolve conflicting knowledge via prompting, decoding strategy, and supervised fine-tuning. To address these limitations, we construct a new dataset, dubbed KNOT, for knowledge conflict resolution examination in the form of question answering. KNOT facilitates in-depth analysis by dividing reasoning with conflicting knowledge into three levels: (1) Direct Extraction, which directly extracts conflicting knowledge to answer questions. (2) Explicit Reasoning, which reasons with conflicting knowledge when the reasoning path is explicitly provided in the question. (3) Implicit Reasoning, where reasoning with conflicting knowledge requires LLMs to infer the reasoning path independently to answer questions. We also conduct extensive experiments on KNOT to establish empirical guidelines for LLMs to utilize conflicting knowledge in complex circumstances. Dataset and associated codes can be accessed at https://github.com/THU-KEG/KNOT .

Untangle the KNOT: Interweaving Conflicting Knowledge and Reasoning Skills in Large Language Models

TL;DR

Knot introduces a knowledge conflict dataset to probe how large language models reconcile external conflicting knowledge with their parametric memory across three reasoning modes: direct extraction (Knot-S), explicit multi-hop reasoning (Knot-E), and implicit multi-hop reasoning (Knot-I). The authors develop a saliency-guided data construction pipeline (ego networks, TransE sampling, data-to-text generation) and human-annotated rationales to create high-quality training and evaluation material. A broad evaluation across pre-trained and assistant LLMs reveals that models perform well on Knot-S but struggle with Knot-E and Knot-I; prompting alone offers limited gains, while fine-tuning on Knot data yields substantial improvements, especially for smaller models. The work also analyzes how model size and reasoning type influence reliance on parametric memory, providing empirically grounded guidelines for selecting strategies (prompting, decoding, or fine-tuning) to resolve knowledge conflicts in complex scenarios. Overall, Knot advances understanding of knowledge-conflict resolution in LLMs and offers practical directions for improving inference when external knowledge conflicts with memory.

Abstract

Providing knowledge documents for large language models (LLMs) has emerged as a promising solution to update the static knowledge inherent in their parameters. However, knowledge in the document may conflict with the memory of LLMs due to outdated or incorrect knowledge in the LLMs' parameters. This leads to the necessity of examining the capability of LLMs to assimilate supplemental external knowledge that conflicts with their memory. While previous studies have explained to what extent LLMs extract conflicting knowledge from the provided text, they neglect the necessity to reason with conflicting knowledge. Furthermore, there lack a detailed analysis on strategies to enable LLMs to resolve conflicting knowledge via prompting, decoding strategy, and supervised fine-tuning. To address these limitations, we construct a new dataset, dubbed KNOT, for knowledge conflict resolution examination in the form of question answering. KNOT facilitates in-depth analysis by dividing reasoning with conflicting knowledge into three levels: (1) Direct Extraction, which directly extracts conflicting knowledge to answer questions. (2) Explicit Reasoning, which reasons with conflicting knowledge when the reasoning path is explicitly provided in the question. (3) Implicit Reasoning, where reasoning with conflicting knowledge requires LLMs to infer the reasoning path independently to answer questions. We also conduct extensive experiments on KNOT to establish empirical guidelines for LLMs to utilize conflicting knowledge in complex circumstances. Dataset and associated codes can be accessed at https://github.com/THU-KEG/KNOT .
Paper Structure (39 sections, 7 figures, 15 tables)

This paper contains 39 sections, 7 figures, 15 tables.

Figures (7)

  • Figure 1: Example questions from Knot where knowledge conflicts are resolved via extraction, explicit reasoning, and implicit reasoning.
  • Figure 2: The overall framework for constructing Knot.
  • Figure 3: Accuracy of LLaMA-2-70B-Chat with regard to question topic entity saliency on Knot without providing the documents. The accuracy is positively correlated with the saliency.
  • Figure 4: Trigram distribution of questions in Knot.
  • Figure 5: Trigram distribution of the annotated rationale.
  • ...and 2 more figures