SemEval-2026 Task 12: Abductive Event Reasoning: Towards Real-World Event Causal Inference for Large Language Models

Pengfei Cao; Mingxuan Yang; Yubo Chen; Chenlong Zhang; Mingxuan Liu; Kang Liu; Jun Zhao

SemEval-2026 Task 12: Abductive Event Reasoning: Towards Real-World Event Causal Inference for Large Language Models

Pengfei Cao, Mingxuan Yang, Yubo Chen, Chenlong Zhang, Mingxuan Liu, Kang Liu, Jun Zhao

Abstract

Understanding why real-world events occur is important for both natural language processing and practical decision-making, yet direct-cause inference remains underexplored in evidence-rich settings. To address this gap, we organized SemEval-2026 Task 12: Abductive Event Reasoning (AER).\footnote{The task data is available at https://github.com/sooo66/semeval2026-task12-dataset.git} The task asks systems to identify the most plausible direct cause of a target event from supporting evidence. We formulate AER as an evidence-grounded multiple-choice benchmark that captures key challenges of real-world causal reasoning, including distributed evidence, indirect background factors, and semantically related but non-causal distractors. The shared task attracted 122 participants and received 518 submissions. This paper presents the task formulation, dataset construction pipeline, evaluation setup, and system results. AER provides a focused benchmark for abductive reasoning over real-world events and highlights challenges for future work on causal reasoning and multi-document understanding.

SemEval-2026 Task 12: Abductive Event Reasoning: Towards Real-World Event Causal Inference for Large Language Models

Abstract

Paper Structure (38 sections, 1 equation, 3 figures, 4 tables)

This paper contains 38 sections, 1 equation, 3 figures, 4 tables.

Introduction
Related Work
Event Causality Reasoning.
Multi-document Reasoning.
Task Description
Task Overview
Task Formalization
Input and Output Format
Evaluation Metrics
Task Example
Dataset Construction
Overall Pipeline
Data Collection and Event Extraction
Timeline Construction
Causality Scoring and Human Verification
...and 23 more sections

Figures (3)

Figure 1: Overview of the Abductive Event Reasoning (AER) task. Given a noisy multi-document evidence collection, the goal is to identify the most plausible direct cause of a target event via evidence-grounded abductive inference. The task is challenging because the supporting evidence is distributed and noisy, systems must focus on direct triggers rather than background conditions, and multiple answer options may be correct.
Figure 2: Construction pipeline of the AER benchmark.
Figure 3: Distributional overview of the dataset: (A) topic composition across six categories, (B) frequency of correct labels at options A–D, (C) topic representative-time distribution with the GPT-4 knowledge cutoff (2023-12-01) marked by a dashed line, and (D) document length distribution in tokens, with mean and median indicated.

SemEval-2026 Task 12: Abductive Event Reasoning: Towards Real-World Event Causal Inference for Large Language Models

Abstract

SemEval-2026 Task 12: Abductive Event Reasoning: Towards Real-World Event Causal Inference for Large Language Models

Authors

Abstract

Table of Contents

Figures (3)