Table of Contents
Fetching ...

WikiCausal: Corpus and Evaluation Framework for Causal Knowledge Graph Construction

Oktie Hassanzadeh

TL;DR

WikiCausal tackles the lack of standardized evaluation for causal knowledge graphs extracted from text by introducing a Wikipedia-based corpus linked to Wikidata event concepts and a two-pronged evaluation framework. It combines recall evaluation against a Wikidata-derived Base KG with a novel, LLM-driven precision verification to enable end-to-end assessment of causal extraction pipelines. The authors implement a modular extraction pipeline (including CauseNet-based and QA+Linking variants) and compare four knowledge graphs, revealing how model choices affect performance across class- and instance-level relations. The resources are publicly available to support reproducible benchmarking and practical deployment in domains like risk analysis, decision support, and event forecasting.

Abstract

Recently, there has been an increasing interest in the construction of general-domain and domain-specific causal knowledge graphs. Such knowledge graphs enable reasoning for causal analysis and event prediction, and so have a range of applications across different domains. While great progress has been made toward automated construction of causal knowledge graphs, the evaluation of such solutions has either focused on low-level tasks (e.g., cause-effect phrase extraction) or on ad hoc evaluation data and small manual evaluations. In this paper, we present a corpus, task, and evaluation framework for causal knowledge graph construction. Our corpus consists of Wikipedia articles for a collection of event-related concepts in Wikidata. The task is to extract causal relations between event concepts from the corpus. The evaluation is performed in part using existing causal relations in Wikidata to measure recall, and in part using Large Language Models to avoid the need for manual or crowd-sourced evaluation. We evaluate a pipeline for causal knowledge graph construction that relies on neural models for question answering and concept linking, and show how the corpus and the evaluation framework allow us to effectively find the right model for each task. The corpus and the evaluation framework are publicly available.

WikiCausal: Corpus and Evaluation Framework for Causal Knowledge Graph Construction

TL;DR

WikiCausal tackles the lack of standardized evaluation for causal knowledge graphs extracted from text by introducing a Wikipedia-based corpus linked to Wikidata event concepts and a two-pronged evaluation framework. It combines recall evaluation against a Wikidata-derived Base KG with a novel, LLM-driven precision verification to enable end-to-end assessment of causal extraction pipelines. The authors implement a modular extraction pipeline (including CauseNet-based and QA+Linking variants) and compare four knowledge graphs, revealing how model choices affect performance across class- and instance-level relations. The resources are publicly available to support reproducible benchmarking and practical deployment in domains like risk analysis, decision support, and event forecasting.

Abstract

Recently, there has been an increasing interest in the construction of general-domain and domain-specific causal knowledge graphs. Such knowledge graphs enable reasoning for causal analysis and event prediction, and so have a range of applications across different domains. While great progress has been made toward automated construction of causal knowledge graphs, the evaluation of such solutions has either focused on low-level tasks (e.g., cause-effect phrase extraction) or on ad hoc evaluation data and small manual evaluations. In this paper, we present a corpus, task, and evaluation framework for causal knowledge graph construction. Our corpus consists of Wikipedia articles for a collection of event-related concepts in Wikidata. The task is to extract causal relations between event concepts from the corpus. The evaluation is performed in part using existing causal relations in Wikidata to measure recall, and in part using Large Language Models to avoid the need for manual or crowd-sourced evaluation. We evaluate a pipeline for causal knowledge graph construction that relies on neural models for question answering and concept linking, and show how the corpus and the evaluation framework allow us to effectively find the right model for each task. The corpus and the evaluation framework are publicly available.
Paper Structure (6 sections, 3 figures)

This paper contains 6 sections, 3 figures.

Figures (3)

  • Figure 1: Examples of Event-Related Causal Knowledge in Wikidata and Wikipedia
  • Figure 2: Selected Top-Level News Event Concepts
  • Figure 3: Example JSON Document