Table of Contents
Fetching ...

ACCESS : A Benchmark for Abstract Causal Event Discovery and Reasoning

Vy Vo, Lizhen Qu, Tao Feng, Yuncheng Hua, Xiaoxi Kang, Songhai Fan, Tim Dwyer, Lay-Ki Soon, Gholamreza Haffari

TL;DR

ACCESS addresses the gap between surface-level causal event detection and robust abstract causal reasoning by introducing a two-phase benchmark that grounds event abstractions in GLUCOSE and then constructs causal graphs over 725 abstractions. The pipeline blends automatic clustering and human annotation to produce 1,494 causal relations across 9,513 stories, enabling evaluation of both abstraction quality and causal discovery. Experiments show that statistical structure learning struggles on sparse, abstract graphs and that large language models still face challenges in non-contextual pairwise abstraction discovery, but incorporating ACCESS-derived abstract causal graphs significantly boosts QA reasoning in LLMs. The work provides a reproducible pipeline and insights into the necessity of improving abstraction granularity and abstract causal representation learning for robust AI reasoning.

Abstract

Identifying cause-and-effect relationships is critical to understanding real-world dynamics and ultimately causal reasoning. Existing methods for identifying event causality in NLP, including those based on Large Language Models (LLMs), exhibit difficulties in out-of-distribution settings due to the limited scale and heavy reliance on lexical cues within available benchmarks. Modern benchmarks, inspired by probabilistic causal inference, have attempted to construct causal graphs of events as a robust representation of causal knowledge, where \texttt{CRAB} \citep{romanou2023crab} is one such recent benchmark along this line. In this paper, we introduce \texttt{ACCESS}, a benchmark designed for discovery and reasoning over abstract causal events. Unlike existing resources, \texttt{ACCESS} focuses on causality of everyday life events on the abstraction level. We propose a pipeline for identifying abstractions for event generalizations from \texttt{GLUCOSE} \citep{mostafazadeh-etal-2020-glucose}, a large-scale dataset of implicit commonsense causal knowledge, from which we subsequently extract $1,4$K causal pairs. Our experiments highlight the ongoing challenges of using statistical methods and/or LLMs for automatic abstraction identification and causal discovery in NLP. Nonetheless, we demonstrate that the abstract causal knowledge provided in \texttt{ACCESS} can be leveraged for enhancing QA reasoning performance in LLMs.

ACCESS : A Benchmark for Abstract Causal Event Discovery and Reasoning

TL;DR

ACCESS addresses the gap between surface-level causal event detection and robust abstract causal reasoning by introducing a two-phase benchmark that grounds event abstractions in GLUCOSE and then constructs causal graphs over 725 abstractions. The pipeline blends automatic clustering and human annotation to produce 1,494 causal relations across 9,513 stories, enabling evaluation of both abstraction quality and causal discovery. Experiments show that statistical structure learning struggles on sparse, abstract graphs and that large language models still face challenges in non-contextual pairwise abstraction discovery, but incorporating ACCESS-derived abstract causal graphs significantly boosts QA reasoning in LLMs. The work provides a reproducible pipeline and insights into the necessity of improving abstraction granularity and abstract causal representation learning for robust AI reasoning.

Abstract

Identifying cause-and-effect relationships is critical to understanding real-world dynamics and ultimately causal reasoning. Existing methods for identifying event causality in NLP, including those based on Large Language Models (LLMs), exhibit difficulties in out-of-distribution settings due to the limited scale and heavy reliance on lexical cues within available benchmarks. Modern benchmarks, inspired by probabilistic causal inference, have attempted to construct causal graphs of events as a robust representation of causal knowledge, where \texttt{CRAB} \citep{romanou2023crab} is one such recent benchmark along this line. In this paper, we introduce \texttt{ACCESS}, a benchmark designed for discovery and reasoning over abstract causal events. Unlike existing resources, \texttt{ACCESS} focuses on causality of everyday life events on the abstraction level. We propose a pipeline for identifying abstractions for event generalizations from \texttt{GLUCOSE} \citep{mostafazadeh-etal-2020-glucose}, a large-scale dataset of implicit commonsense causal knowledge, from which we subsequently extract K causal pairs. Our experiments highlight the ongoing challenges of using statistical methods and/or LLMs for automatic abstraction identification and causal discovery in NLP. Nonetheless, we demonstrate that the abstract causal knowledge provided in \texttt{ACCESS} can be leveraged for enhancing QA reasoning performance in LLMs.

Paper Structure

This paper contains 44 sections, 3 equations, 2 figures, 17 tables.

Figures (2)

  • Figure 1: Pipeline of abstract causal event discovery. An event is viewed from three hierarchical levels: mention (realization in a specific text corpus), generalization (conceptualization of the event's components) and abstraction (group of causally consistent generalizations). Given a collection of event mentions, Phase $1$ produces a collection of abstractions $A, B,C$ that are mapped back to the original corpus to construct a suitable representation in Phase $2$, such as a co-occurrence matrix. Causal discovery algorithms can then be employed to detect causal relations within the data, which may consider the contexts.
  • Figure 2: SHD (left) and F1 score (right) of estimated DAGs from statistical structure learning methods. Lower SHD is better. Higher F1 is better.