Preserving Generalization of Language models in Few-shot Continual Relation Extraction

Quyen Tran; Nguyen Xuan Thanh; Nguyen Hoang Anh; Nam Le Hai; Trung Le; Linh Van Ngo; Thien Huu Nguyen

Preserving Generalization of Language models in Few-shot Continual Relation Extraction

Quyen Tran, Nguyen Xuan Thanh, Nguyen Hoang Anh, Nam Le Hai, Trung Le, Linh Van Ngo, Thien Huu Nguyen

TL;DR

A novel method is introduced that leverages often-discarded language model heads and strategically aligns the primary classification head, thereby enhancing model performance and exploring the potential of Large Language Models (LLMs), renowned for their wealth of knowledge, in addressing FCRE challenges.

Abstract

Few-shot Continual Relations Extraction (FCRE) is an emerging and dynamic area of study where models can sequentially integrate knowledge from new relations with limited labeled data while circumventing catastrophic forgetting and preserving prior knowledge from pre-trained backbones. In this work, we introduce a novel method that leverages often-discarded language model heads. By employing these components via a mutual information maximization strategy, our approach helps maintain prior knowledge from the pre-trained backbone and strategically aligns the primary classification head, thereby enhancing model performance. Furthermore, we explore the potential of Large Language Models (LLMs), renowned for their wealth of knowledge, in addressing FCRE challenges. Our comprehensive experimental results underscore the efficacy of the proposed method and offer valuable insights for future work.

Preserving Generalization of Language models in Few-shot Continual Relation Extraction

TL;DR

Abstract

Paper Structure (26 sections, 6 equations, 5 figures, 6 tables)

This paper contains 26 sections, 6 equations, 5 figures, 6 tables.

Introduction
Related work
Continual Learning (CL)
Fewshot Continual Relation Extraction
Background
Problem Formulation
Existing Concept of FCRE Models
Proposed Method
Mutual Information Maximization (MIM)
Discussion:
Exploiting LLMs for FCRE
Motivations and Research questions
How to adapt BERT-based FCRE methods to LLMs?
Experimental Results
Experiment Setup
...and 11 more sections

Figures (5)

Figure 1: Accuracy drop (%) after learning eight tasks of methods on TACRED 5-way-5-shot. Lower is better.
Figure 2: Generalization gap regarding loss of models after training each task (TACRED 5-way-5-shot, seed=100).
Figure 3: Our Framework
Figure 5: t-SNE visualization of the representation of 10 relations from the first task of CPL+MI on the LM head after the last task (FewRel 10-way 5-shot).
Figure 6: Adapting LLMs for FCRE problems

Preserving Generalization of Language models in Few-shot Continual Relation Extraction

TL;DR

Abstract

Preserving Generalization of Language models in Few-shot Continual Relation Extraction

Authors

TL;DR

Abstract

Table of Contents

Figures (5)