Improving Large Language Models in Event Relation Logical Prediction
Meiqi Chen, Yubo Ma, Kaitao Song, Yixin Cao, Yan Zhang, Dongsheng Li
TL;DR
The paper investigates why large language models struggle with event relation logic in Event Relation Extraction (ERE) and proposes three complementary strategies to endow LLMs with logical reasoning capabilities: generative-based reasoning with logical constraints, retrieval-based constraint augmentation, and finetuning on a synthesized high-order reasoning dataset (LLM-ERL). It introduces a logical consistency metric (LI) to quantify coherence and builds LLM-ERL to support multi-hop reasoning across two major ERE datasets. Empirical results show that injecting relevant constraints reduces logical inconsistencies and improves micro-F1 across MAVEN-ERE and Causal-TimeBank, with finetuned models like Llama2-FT achieving strong gains (e.g., 26.4% F1 on MAVEN-ERE), approaching or surpassing some larger LLM baselines. The work highlights the practical value of explicit logical guidance for constrained reasoning tasks and suggests future directions for generalizing logical reasoning capabilities beyond ERE to broader deductive tasks.
Abstract
Event relations are crucial for narrative understanding and reasoning. Governed by nuanced logic, event relation extraction (ERE) is a challenging task that demands thorough semantic understanding and rigorous logical reasoning. In this paper, we conduct an in-depth investigation to systematically explore the capability of LLMs in understanding and applying event relation logic. More in detail, we first investigate the deficiencies of LLMs in logical reasoning across different tasks. Our study reveals that LLMs are not logically consistent reasoners, which results in their suboptimal performance on tasks that need rigorous reasoning. To address this, we explore three different approaches to endow LLMs with event relation logic, and thus enable them to generate more coherent answers across various scenarios. Based on our approach, we also contribute a synthesized dataset (LLM-ERL) involving high-order reasoning for evaluation and fine-tuning. Extensive quantitative and qualitative analyses on different tasks also validate the effectiveness of our approaches and provide insights for solving practical tasks with LLMs in future work. Codes are available at https://github.com/chenmeiqii/Teach-LLM-LR.
