An Empirical Study of Causal Relation Extraction Transfer: Design and Data

Sydney Anuyah; Jack Vanschaik; Palak Jain; Sawyer Lehman; Sunandan Chakraborty

An Empirical Study of Causal Relation Extraction Transfer: Design and Data

Sydney Anuyah, Jack Vanschaik, Palak Jain, Sawyer Lehman, Sunandan Chakraborty

TL;DR

This work addresses open domain causal relation extraction by evaluating cross dataset transfer across six datasets using neural models. It finds that BioBERT-BiGRU offers robust generalization, and introduces $F1_{phrase}$ to emphasize noun phrase localization during transfer. The study shows that data augmentation across diverse domains and annotation styles significantly enhances transfer performance, and that the composition of implicit versus explicit causality in training data often outweighs mere increases in data size. These findings support building scalable open domain causal knowledge extraction systems by leveraging diverse annotated data and domain specialized embeddings.

Abstract

We conduct an empirical analysis of neural network architectures and data transfer strategies for causal relation extraction. By conducting experiments with various contextual embedding layers and architectural components, we show that a relatively straightforward BioBERT-BiGRU relation extraction model generalizes better than other architectures across varying web-based sources and annotation strategies. Furthermore, we introduce a metric for evaluating transfer performance, $F1_{phrase}$ that emphasizes noun phrase localization rather than directly matching target tags. Using this metric, we can conduct data transfer experiments, ultimately revealing that augmentation with data with varying domains and annotation styles can improve performance. Data augmentation is especially beneficial when an adequate proportion of implicitly and explicitly causal sentences are included.

An Empirical Study of Causal Relation Extraction Transfer: Design and Data

TL;DR

to emphasize noun phrase localization during transfer. The study shows that data augmentation across diverse domains and annotation styles significantly enhances transfer performance, and that the composition of implicit versus explicit causality in training data often outweighs mere increases in data size. These findings support building scalable open domain causal knowledge extraction systems by leveraging diverse annotated data and domain specialized embeddings.

Abstract

that emphasizes noun phrase localization rather than directly matching target tags. Using this metric, we can conduct data transfer experiments, ultimately revealing that augmentation with data with varying domains and annotation styles can improve performance. Data augmentation is especially beneficial when an adequate proportion of implicitly and explicitly causal sentences are included.

An Empirical Study of Causal Relation Extraction Transfer: Design and Data

TL;DR

Abstract

An Empirical Study of Causal Relation Extraction Transfer: Design and Data

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)