Table of Contents
Fetching ...

Getting Sick After Seeing a Doctor? Diagnosing and Mitigating Knowledge Conflicts in Event Temporal Reasoning

Tianqing Fang, Zhaowei Wang, Wenxuan Zhou, Hongming Zhang, Yangqiu Song, Muhao Chen

TL;DR

This work addresses knowledge conflicts in event temporal reasoning by formalizing four bias types that cause TempRel predictions to diverge from contextual reality. It introduces a bias-detection framework and Counterfactual Data Augmentation (CDA) to mitigate these conflicts, applicable to both PLMs and LLMs. Empirical results on TORQUE and MATRES show that knowledge-conflict subsets are challenging and that CDA improves performance, often outperforming bias-agnostic baselines and, in many cases, GDA, with LLMs benefiting notably from counterfactual demonstrations. The study demonstrates that reducing bias and hallucination in event-temporal reasoning enhances context-faithful inference, with practical implications for robust narrative understanding and downstream QA tasks.

Abstract

Event temporal reasoning aims at identifying the temporal relations between two or more events from narratives. However, knowledge conflicts arise when there is a mismatch between the actual temporal relations of events in the context and the prior knowledge or biases learned by the model. In this paper, we propose to detect knowledge-conflict examples in event temporal reasoning using bias indicators, which include event relation prior bias, tense bias, narrative bias, and dependency bias. We define conflict examples as those where event relations are opposite to biased or prior relations. To mitigate event-related knowledge conflicts, we introduce a Counterfactual Data Augmentation (CDA) based method that can be applied to both Pre-trained Language Models (PLMs) and Large Language Models (LLMs) either as additional training data or demonstrations for In-Context Learning. Experiments suggest both PLMs and LLMs suffer from knowledge conflicts in event temporal reasoning, and CDA has the potential for reducing hallucination and improving model performance.

Getting Sick After Seeing a Doctor? Diagnosing and Mitigating Knowledge Conflicts in Event Temporal Reasoning

TL;DR

This work addresses knowledge conflicts in event temporal reasoning by formalizing four bias types that cause TempRel predictions to diverge from contextual reality. It introduces a bias-detection framework and Counterfactual Data Augmentation (CDA) to mitigate these conflicts, applicable to both PLMs and LLMs. Empirical results on TORQUE and MATRES show that knowledge-conflict subsets are challenging and that CDA improves performance, often outperforming bias-agnostic baselines and, in many cases, GDA, with LLMs benefiting notably from counterfactual demonstrations. The study demonstrates that reducing bias and hallucination in event-temporal reasoning enhances context-faithful inference, with practical implications for robust narrative understanding and downstream QA tasks.

Abstract

Event temporal reasoning aims at identifying the temporal relations between two or more events from narratives. However, knowledge conflicts arise when there is a mismatch between the actual temporal relations of events in the context and the prior knowledge or biases learned by the model. In this paper, we propose to detect knowledge-conflict examples in event temporal reasoning using bias indicators, which include event relation prior bias, tense bias, narrative bias, and dependency bias. We define conflict examples as those where event relations are opposite to biased or prior relations. To mitigate event-related knowledge conflicts, we introduce a Counterfactual Data Augmentation (CDA) based method that can be applied to both Pre-trained Language Models (PLMs) and Large Language Models (LLMs) either as additional training data or demonstrations for In-Context Learning. Experiments suggest both PLMs and LLMs suffer from knowledge conflicts in event temporal reasoning, and CDA has the potential for reducing hallucination and improving model performance.
Paper Structure (54 sections, 11 equations, 3 figures, 17 tables)

This paper contains 54 sections, 11 equations, 3 figures, 17 tables.

Figures (3)

  • Figure 1: An example of a knowledge-conflict instance. The actual TempRel in the context differs from the biased or prior TempRel in the corpus and the language model, leading to the emergence of knowledge conflicts.
  • Figure 2: An overview of the CDA pipeline.
  • Figure 3: Effect of varying proportions of Counterfactual Data Augmentation (CDA) on MATRES. Models benefit from increased amounts of CDA data.