Table of Contents
Fetching ...

Can LLMs Reconcile Knowledge Conflicts in Counterfactual Reasoning

Khurram Yamin, Gaurav Ghosal, Bryan Wilder

TL;DR

The paper investigates whether LLMs can reconcile parametric world knowledge with in-context counterfactual premises in multi-hop reasoning. It introduces counterfactual QA benchmarks and toy experiments to isolate scenarios where context reinforces, adds, contradicts, or is irrelevant to prior knowledge. Empirical results show two main failure modes: context-ignoring and context-overfitting, with simple finetuning often degrading stored knowledge and pretraining counterfactual data yielding trade-offs between reasoning ability and factual accuracy. The findings highlight fundamental limits in current LLMs' ability to adapt internal knowledge on demand, motivating new training and architectural approaches for dynamic knowledge integration.

Abstract

Large Language Models have been shown to contain extensive world knowledge in their parameters, enabling impressive performance on many knowledge intensive tasks. However, when deployed in novel settings, LLMs often encounter situations where they must integrate parametric knowledge with new or unfamiliar information. In this work, we explore whether LLMs can combine knowledge in-context with their parametric knowledge through the lens of counterfactual reasoning. Through synthetic and real experiments in multi-hop reasoning problems, we show that LLMs generally struggle with counterfactual reasoning, often resorting to exclusively using their parametric knowledge. Moreover, we show that simple post-hoc finetuning can struggle to instill counterfactual reasoning ability -- often leading to degradation in stored parametric knowledge. Ultimately, our work reveals important limitations of current LLM's abilities to re-purpose parametric knowledge in novel settings.

Can LLMs Reconcile Knowledge Conflicts in Counterfactual Reasoning

TL;DR

The paper investigates whether LLMs can reconcile parametric world knowledge with in-context counterfactual premises in multi-hop reasoning. It introduces counterfactual QA benchmarks and toy experiments to isolate scenarios where context reinforces, adds, contradicts, or is irrelevant to prior knowledge. Empirical results show two main failure modes: context-ignoring and context-overfitting, with simple finetuning often degrading stored knowledge and pretraining counterfactual data yielding trade-offs between reasoning ability and factual accuracy. The findings highlight fundamental limits in current LLMs' ability to adapt internal knowledge on demand, motivating new training and architectural approaches for dynamic knowledge integration.

Abstract

Large Language Models have been shown to contain extensive world knowledge in their parameters, enabling impressive performance on many knowledge intensive tasks. However, when deployed in novel settings, LLMs often encounter situations where they must integrate parametric knowledge with new or unfamiliar information. In this work, we explore whether LLMs can combine knowledge in-context with their parametric knowledge through the lens of counterfactual reasoning. Through synthetic and real experiments in multi-hop reasoning problems, we show that LLMs generally struggle with counterfactual reasoning, often resorting to exclusively using their parametric knowledge. Moreover, we show that simple post-hoc finetuning can struggle to instill counterfactual reasoning ability -- often leading to degradation in stored parametric knowledge. Ultimately, our work reveals important limitations of current LLM's abilities to re-purpose parametric knowledge in novel settings.

Paper Structure

This paper contains 40 sections, 6 figures.

Figures (6)

  • Figure 1: Concrete instantiation of the query. The counterfactual premise overrides Paris’s country to Italy. A correct system performs Contextual Override and Selective Retrieval and answers Italy.
  • Figure 2: Causal Counterfactual (CF) Plots comparing standard GPT-4o, GPT-4o CoT, GPT-4o Fine tuned and GPT-5(Thinking) results. (a) Counterfactual Reinforces Prior, (b) Counterfactual Adds new Information, (c) Counterfactual Conflicts with Prior, (d) Counterfactual is Irrelevant to Prior and Query. 95 $\%$ CI is shown.
  • Figure 3: Conceptual visualization of toy Counterfactual (CF) tasks with shortened segments. Solid arrows are stored (parametric) edges; dashed arrows are counterfactual premises. Panels (a)–(c) are the three evaluation splits; panel (d) is the factual CoT control.
  • Figure 4: Breakdown of Performance in a Conceptual Setting(a-c) We plot the test accuracy across stages of finetuning of the three types of counterfactual reasoning queries introduced in Section 4.1. Our findings reveal that while finetuning can enable the transformer to incorporate the contextual knowledge, it is ineffective at inducing selective usage of contextual knowledge. As a result, performance on the irrelevant counterfactual split is low. (d) We show the performance of a factual CoT task which does not introduce any conflict with parametric knowledge. We find that fine-tuning is capable of incorporating this novel task into the model.
  • Figure 5: Incorporating Counterfactual Data in Pretraining We plot the worst-split accuracy across pretraining when counterfactual examples are incorporated throughout pretraining both when counterfactual examples. We observe that the in both cases, the counterfactual reasoning performance approaches $100\%$ and that marking counterfactual reasoning prompts accelerates training.
  • ...and 1 more figures