Table of Contents
Fetching ...

Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models

Zhi-Xiu Ye, Qian Chen, Wen Wang, Zhen-Hua Ling

TL;DR

The paper tackles the limited incorporation of commonsense knowledge in pre-trained language representations. It introduces Align, Mask and Select (AMS), a method that converts ConceptNet triples into a large natural-language multi-choice QA pre-training dataset, and demonstrates that pre-training with MCQA objectives (BERT_CS) significantly improves commonsense benchmarks CSQA and WSC while preserving GLUE performance. Through extensive ablations, the work shows that MCQA-based pre-training with linguistically natural data is more effective than MLM-based or triple-only pre-training, and that the approach generalizes without degrading broader language understanding. The results suggest that knowledge-augmented pre-training is a viable path to enhancing commonsense reasoning in language models, with potential applicability to other architectures such as XLNet and RoBERTa.

Abstract

The state-of-the-art pre-trained language representation models, such as Bidirectional Encoder Representations from Transformers (BERT), rarely incorporate commonsense knowledge or other knowledge explicitly. We propose a pre-training approach for incorporating commonsense knowledge into language representation models. We construct a commonsense-related multi-choice question answering dataset for pre-training a neural language representation model. The dataset is created automatically by our proposed "align, mask, and select" (AMS) method. We also investigate different pre-training tasks. Experimental results demonstrate that pre-training models using the proposed approach followed by fine-tuning achieve significant improvements over previous state-of-the-art models on two commonsense-related benchmarks, including CommonsenseQA and Winograd Schema Challenge. We also observe that fine-tuned models after the proposed pre-training approach maintain comparable performance on other NLP tasks, such as sentence classification and natural language inference tasks, compared to the original BERT models. These results verify that the proposed approach, while significantly improving commonsense-related NLP tasks, does not degrade the general language representation capabilities.

Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models

TL;DR

The paper tackles the limited incorporation of commonsense knowledge in pre-trained language representations. It introduces Align, Mask and Select (AMS), a method that converts ConceptNet triples into a large natural-language multi-choice QA pre-training dataset, and demonstrates that pre-training with MCQA objectives (BERT_CS) significantly improves commonsense benchmarks CSQA and WSC while preserving GLUE performance. Through extensive ablations, the work shows that MCQA-based pre-training with linguistically natural data is more effective than MLM-based or triple-only pre-training, and that the approach generalizes without degrading broader language understanding. The results suggest that knowledge-augmented pre-training is a viable path to enhancing commonsense reasoning in language models, with potential applicability to other architectures such as XLNet and RoBERTa.

Abstract

The state-of-the-art pre-trained language representation models, such as Bidirectional Encoder Representations from Transformers (BERT), rarely incorporate commonsense knowledge or other knowledge explicitly. We propose a pre-training approach for incorporating commonsense knowledge into language representation models. We construct a commonsense-related multi-choice question answering dataset for pre-training a neural language representation model. The dataset is created automatically by our proposed "align, mask, and select" (AMS) method. We also investigate different pre-training tasks. Experimental results demonstrate that pre-training models using the proposed approach followed by fine-tuning achieve significant improvements over previous state-of-the-art models on two commonsense-related benchmarks, including CommonsenseQA and Winograd Schema Challenge. We also observe that fine-tuned models after the proposed pre-training approach maintain comparable performance on other NLP tasks, such as sentence classification and natural language inference tasks, compared to the original BERT models. These results verify that the proposed approach, while significantly improving commonsense-related NLP tasks, does not degrade the general language representation capabilities.

Paper Structure

This paper contains 18 sections, 3 equations, 1 figure, 8 tables.

Figures (1)

  • Figure 1: BERT_CS$_{base}$ and BERT_CS$_{large}$ accuracy on the CSQA development set against the number of pre-training steps.