TRELM: Towards Robust and Efficient Pre-training for Knowledge-Enhanced Language Models

Junbing Yan; Chengyu Wang; Taolin Zhang; Xiaofeng He; Jun Huang; Longtao Huang; Hui Xue; Wei Zhang

TRELM: Towards Robust and Efficient Pre-training for Knowledge-Enhanced Language Models

Junbing Yan, Chengyu Wang, Taolin Zhang, Xiaofeng He, Jun Huang, Longtao Huang, Hui Xue, Wei Zhang

TL;DR

This paper introduces TRELM, a Robust and Efficient Pre-training framework for Knowledge-Enhanced Language Models that employs a robust approach to inject knowledge triples and employ a knowledge-augmented memory bank to capture valuable information.

Abstract

KEPLMs are pre-trained models that utilize external knowledge to enhance language understanding. Previous language models facilitated knowledge acquisition by incorporating knowledge-related pre-training tasks learned from relation triples in knowledge graphs. However, these models do not prioritize learning embeddings for entity-related tokens. Moreover, updating the entire set of parameters in KEPLMs is computationally demanding. This paper introduces TRELM, a Robust and Efficient Pre-training framework for Knowledge-Enhanced Language Models. We observe that entities in text corpora usually follow the long-tail distribution, where the representations of some entities are suboptimally optimized and hinder the pre-training process for KEPLMs. To tackle this, we employ a robust approach to inject knowledge triples and employ a knowledge-augmented memory bank to capture valuable information. Furthermore, updating a small subset of neurons in the feed-forward networks (FFNs) that store factual knowledge is both sufficient and efficient. Specifically, we utilize dynamic knowledge routing to identify knowledge paths in FFNs and selectively update parameters during pre-training. Experimental results show that TRELM reduces pre-training time by at least 50% and outperforms other KEPLMs in knowledge probing tasks and multiple knowledge-aware language understanding tasks.

TRELM: Towards Robust and Efficient Pre-training for Knowledge-Enhanced Language Models

TL;DR

Abstract

Paper Structure (21 sections, 12 equations, 8 figures, 6 tables)

This paper contains 21 sections, 12 equations, 8 figures, 6 tables.

Introduction
Related Work
KEPLMs
Attribution Methods in Transformers
Attribution for KEPLMs
TRELM: The Proposed Framework
Noise-aware Knowledge Injection
Enhancing Representations with Knowledge-augmented Memory Bank
Learning with Dynamic Knowledge Paths
Summarization of Pre-training Process
Experiments
Experimental Setup
Knowledge-aware Tasks
Language Understanding Tasks
Analysis of Pre-training Efficiency
...and 6 more sections

Figures (8)

Figure 1: Comparison between TRELM and other models. (a) Plain PLMs usually utilize masked language modeling as the pre-training objective. (b) Some KEPLMs utilize external knowledge sources (e.g., KGs) and design knowledge-aware tasks which need additional knowledge encoders. (c) During pre-training, TRELM uses a BERT-style shared encoder and a knowledge-augmented memory bank to inject factual knowledge. Moreover, we only need to update partial FFN parameters in Transformer blocks with a dynamic knowledge routing method.
Figure 2: Model overview. (1) Input: Detecting important entities and long-tail words to reduce the knowledge noises. (2) Knowledge-augmented Memory Bank: Querying the important knowledge learned previously through a "cheat sheet" that contains semantic information of entities and words. (3) Dynamic Knowledge Routing: Finding the knowledge paths related to the knowledge-aware task, and selectively update the model's parameters.
Figure 3: Injection method efficiency over Open Entity and TACRED.
Figure 4: The curves of the pre-training loss.
Figure 5: F1 score on Open Entity and TACRED for models trained under the same experiment setting.
...and 3 more figures

TRELM: Towards Robust and Efficient Pre-training for Knowledge-Enhanced Language Models

TL;DR

Abstract

TRELM: Towards Robust and Efficient Pre-training for Knowledge-Enhanced Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (8)