CausalBERT: Injecting Causal Knowledge Into Pre-trained Models with Minimal Supervision
Zhongyang Li, Xiao Ding, Kuo Liao, Bing Qin, Ting Liu
TL;DR
This paper tackles the lack of causal reasoning in pretrained models by introducing CausalBERT, a three-stage framework that injects causal knowledge from multi-granularity resources (sentence-level like CausalBank and ConceptNet; word-level via templates, CausalNet, and embeddings) through two auxiliary pre-training tasks and a regularization-based forgetting mechanism. It systematically evaluates on seven benchmarks across causal pair classification, COPA-style causal inference, and causal QA, achieving state-of-the-art or competitive results, including 93.5% COPA accuracy with ALBERT-xxlarge and strong zero-shot performance. The findings show that carefully curated causal resources can significantly enhance causal understanding, though effectiveness is task-dependent and sensitive to resource noise and forgetting. Overall, CausalBERT demonstrates the feasibility and impact of integrating causal knowledge into pretrained models for improved natural language reasoning.
Abstract
Recent work has shown success in incorporating pre-trained models like BERT to improve NLP systems. However, existing pre-trained models lack of causal knowledge which prevents today's NLP systems from thinking like humans. In this paper, we investigate the problem of injecting causal knowledge into pre-trained models. There are two fundamental problems: 1) how to collect various granularities of causal pairs from unstructured texts; 2) how to effectively inject causal knowledge into pre-trained models. To address these issues, we extend the idea of CausalBERT from previous studies, and conduct experiments on various datasets to evaluate its effectiveness. In addition, we adopt a regularization-based method to preserve the already learned knowledge with an extra regularization term while injecting causal knowledge. Extensive experiments on 7 datasets, including four causal pair classification tasks, two causal QA tasks and a causal inference task, demonstrate that CausalBERT captures rich causal knowledge and outperforms all pre-trained models-based state-of-the-art methods, achieving a new causal inference benchmark.
