Table of Contents
Fetching ...

Improving Large Language Models in Event Relation Logical Prediction

Meiqi Chen, Yubo Ma, Kaitao Song, Yixin Cao, Yan Zhang, Dongsheng Li

TL;DR

The paper investigates why large language models struggle with event relation logic in Event Relation Extraction (ERE) and proposes three complementary strategies to endow LLMs with logical reasoning capabilities: generative-based reasoning with logical constraints, retrieval-based constraint augmentation, and finetuning on a synthesized high-order reasoning dataset (LLM-ERL). It introduces a logical consistency metric (LI) to quantify coherence and builds LLM-ERL to support multi-hop reasoning across two major ERE datasets. Empirical results show that injecting relevant constraints reduces logical inconsistencies and improves micro-F1 across MAVEN-ERE and Causal-TimeBank, with finetuned models like Llama2-FT achieving strong gains (e.g., 26.4% F1 on MAVEN-ERE), approaching or surpassing some larger LLM baselines. The work highlights the practical value of explicit logical guidance for constrained reasoning tasks and suggests future directions for generalizing logical reasoning capabilities beyond ERE to broader deductive tasks.

Abstract

Event relations are crucial for narrative understanding and reasoning. Governed by nuanced logic, event relation extraction (ERE) is a challenging task that demands thorough semantic understanding and rigorous logical reasoning. In this paper, we conduct an in-depth investigation to systematically explore the capability of LLMs in understanding and applying event relation logic. More in detail, we first investigate the deficiencies of LLMs in logical reasoning across different tasks. Our study reveals that LLMs are not logically consistent reasoners, which results in their suboptimal performance on tasks that need rigorous reasoning. To address this, we explore three different approaches to endow LLMs with event relation logic, and thus enable them to generate more coherent answers across various scenarios. Based on our approach, we also contribute a synthesized dataset (LLM-ERL) involving high-order reasoning for evaluation and fine-tuning. Extensive quantitative and qualitative analyses on different tasks also validate the effectiveness of our approaches and provide insights for solving practical tasks with LLMs in future work. Codes are available at https://github.com/chenmeiqii/Teach-LLM-LR.

Improving Large Language Models in Event Relation Logical Prediction

TL;DR

The paper investigates why large language models struggle with event relation logic in Event Relation Extraction (ERE) and proposes three complementary strategies to endow LLMs with logical reasoning capabilities: generative-based reasoning with logical constraints, retrieval-based constraint augmentation, and finetuning on a synthesized high-order reasoning dataset (LLM-ERL). It introduces a logical consistency metric (LI) to quantify coherence and builds LLM-ERL to support multi-hop reasoning across two major ERE datasets. Empirical results show that injecting relevant constraints reduces logical inconsistencies and improves micro-F1 across MAVEN-ERE and Causal-TimeBank, with finetuned models like Llama2-FT achieving strong gains (e.g., 26.4% F1 on MAVEN-ERE), approaching or surpassing some larger LLM baselines. The work highlights the practical value of explicit logical guidance for constrained reasoning tasks and suggests future directions for generalizing logical reasoning capabilities beyond ERE to broader deductive tasks.

Abstract

Event relations are crucial for narrative understanding and reasoning. Governed by nuanced logic, event relation extraction (ERE) is a challenging task that demands thorough semantic understanding and rigorous logical reasoning. In this paper, we conduct an in-depth investigation to systematically explore the capability of LLMs in understanding and applying event relation logic. More in detail, we first investigate the deficiencies of LLMs in logical reasoning across different tasks. Our study reveals that LLMs are not logically consistent reasoners, which results in their suboptimal performance on tasks that need rigorous reasoning. To address this, we explore three different approaches to endow LLMs with event relation logic, and thus enable them to generate more coherent answers across various scenarios. Based on our approach, we also contribute a synthesized dataset (LLM-ERL) involving high-order reasoning for evaluation and fine-tuning. Extensive quantitative and qualitative analyses on different tasks also validate the effectiveness of our approaches and provide insights for solving practical tasks with LLMs in future work. Codes are available at https://github.com/chenmeiqii/Teach-LLM-LR.
Paper Structure (69 sections, 15 figures, 7 tables, 1 algorithm)

This paper contains 69 sections, 15 figures, 7 tables, 1 algorithm.

Figures (15)

  • Figure 1: An example of LLM in generating logically inconsistent answers. We let an LLM (e.g., ChatGPT) predict the relations between events "FIRE" and "collapsed" from the given passage. We can find that LLM predicts an incorrect answer (i.e., SIMULTANEOUS) because it ignores some prior logic in this scenario.
  • Figure 2: Performance of ChatGPT in the pilot study.
  • Figure 3: Error analysis of ChatGPT in the pilot study by human evaluation. CE and FE denote incorrectness and unfaithfulness errors, respectively.
  • Figure 4: Incorporate logical constraints into LLMs by using generative, retrieval, and finetuning-based approaches. The dashed boxes indicate answers outputted by LLMs, and the underlined texts indicate the logical constraints.
  • Figure 5: Ablation Study of ChatGPT for demonstrations and iterative retrieval, where "logic cst" denotes the event relation logical constraints.
  • ...and 10 more figures