Table of Contents
Fetching ...

SemEval-2026 Task 12: Abductive Event Reasoning: Towards Real-World Event Causal Inference for Large Language Models

Pengfei Cao, Mingxuan Yang, Yubo Chen, Chenlong Zhang, Mingxuan Liu, Kang Liu, Jun Zhao

Abstract

Understanding why real-world events occur is important for both natural language processing and practical decision-making, yet direct-cause inference remains underexplored in evidence-rich settings. To address this gap, we organized SemEval-2026 Task 12: Abductive Event Reasoning (AER).\footnote{The task data is available at https://github.com/sooo66/semeval2026-task12-dataset.git} The task asks systems to identify the most plausible direct cause of a target event from supporting evidence. We formulate AER as an evidence-grounded multiple-choice benchmark that captures key challenges of real-world causal reasoning, including distributed evidence, indirect background factors, and semantically related but non-causal distractors. The shared task attracted 122 participants and received 518 submissions. This paper presents the task formulation, dataset construction pipeline, evaluation setup, and system results. AER provides a focused benchmark for abductive reasoning over real-world events and highlights challenges for future work on causal reasoning and multi-document understanding.

SemEval-2026 Task 12: Abductive Event Reasoning: Towards Real-World Event Causal Inference for Large Language Models

Abstract

Understanding why real-world events occur is important for both natural language processing and practical decision-making, yet direct-cause inference remains underexplored in evidence-rich settings. To address this gap, we organized SemEval-2026 Task 12: Abductive Event Reasoning (AER).\footnote{The task data is available at https://github.com/sooo66/semeval2026-task12-dataset.git} The task asks systems to identify the most plausible direct cause of a target event from supporting evidence. We formulate AER as an evidence-grounded multiple-choice benchmark that captures key challenges of real-world causal reasoning, including distributed evidence, indirect background factors, and semantically related but non-causal distractors. The shared task attracted 122 participants and received 518 submissions. This paper presents the task formulation, dataset construction pipeline, evaluation setup, and system results. AER provides a focused benchmark for abductive reasoning over real-world events and highlights challenges for future work on causal reasoning and multi-document understanding.
Paper Structure (38 sections, 1 equation, 3 figures, 4 tables)

This paper contains 38 sections, 1 equation, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Overview of the Abductive Event Reasoning (AER) task. Given a noisy multi-document evidence collection, the goal is to identify the most plausible direct cause of a target event via evidence-grounded abductive inference. The task is challenging because the supporting evidence is distributed and noisy, systems must focus on direct triggers rather than background conditions, and multiple answer options may be correct.
  • Figure 2: Construction pipeline of the AER benchmark.
  • Figure 3: Distributional overview of the dataset: (A) topic composition across six categories, (B) frequency of correct labels at options A–D, (C) topic representative-time distribution with the GPT-4 knowledge cutoff (2023-12-01) marked by a dashed line, and (D) document length distribution in tokens, with mean and median indicated.