Table of Contents
Fetching ...

AILS-NTUA at SemEval-2026 Task 12: Graph-Based Retrieval and Reflective Prompting for Abductive Event Reasoning

Nikolas Karafyllis, Maria Lymperaiou, Giorgos Filandrianos, Athanasios Voulodimos, Giorgos Stamou

Abstract

We present a winning three-stage system for SemEval 2026 Task~12: Abductive Event Reasoning that combines graph-based retrieval, LLM-driven abductive reasoning with prompt design optimized through reflective prompt evolution, and post-hoc consistency enforcement; our system ranks first on the evaluation-phase leaderboard with an accuracy score of 0.95. Cross-model error analysis across 14 models (7~families) reveals three shared inductive biases: causal chain incompleteness, proximate cause preference, and salience bias, whose cross-family convergence (51\% cause-count reduction) indicates systematic rather than model-specific failure modes in multi-label causal reasoning.

AILS-NTUA at SemEval-2026 Task 12: Graph-Based Retrieval and Reflective Prompting for Abductive Event Reasoning

Abstract

We present a winning three-stage system for SemEval 2026 Task~12: Abductive Event Reasoning that combines graph-based retrieval, LLM-driven abductive reasoning with prompt design optimized through reflective prompt evolution, and post-hoc consistency enforcement; our system ranks first on the evaluation-phase leaderboard with an accuracy score of 0.95. Cross-model error analysis across 14 models (7~families) reveals three shared inductive biases: causal chain incompleteness, proximate cause preference, and salience bias, whose cross-family convergence (51\% cause-count reduction) indicates systematic rather than model-specific failure modes in multi-label causal reasoning.
Paper Structure (64 sections, 1 equation, 21 figures, 18 tables, 1 algorithm)

This paper contains 64 sections, 1 equation, 21 figures, 18 tables, 1 algorithm.

Figures (21)

  • Figure 1: System pipeline. Stage 1 constructs a hybrid document graph (Figure \ref{['fig:graphrag']}), selects dense/sparse entry points, retrieves the connected component, and filters disconnected distractors. Stage 2 performs structured analysis-before-answer prompting with self-consistency. Stage 3 applies eight post-hoc consistency heuristics.
  • Figure 2: Hybrid document-graph retrieval in three steps. Step 1: Build a hybrid similarity graph ($\alpha{=}0.7$ dense $+$$0.3$ sparse); disconnected documents ($d_9$--$d_{12}$) are potential distractors. Step 2: At query time, pick entry points from dense and sparse signals (3$+$2, deduplicated). Step 3: Retrieve the full connected component from the seeds, filter disconnected documents, and pass the selected topic context to the LLM reasoner.
  • Figure 3: Dataset composition across splits.
  • Figure 4: Answer frequency by option.
  • Figure 5: Distribution of answer cardinality.
  • ...and 16 more figures