Table of Contents
Fetching ...

Fine-Tuning Language Models on Multiple Datasets for Citation Intention Classification

Zeren Shui, Petros Karypis, Daniel S. Karls, Mingjian Wen, Saurav Manchanda, Ellad B. Tadmor, George Karypis

TL;DR

A multi-task learning (MTL) framework that jointly fine-tunes PLMs on a dataset of primary interest together with multiple auxiliary CIC datasets to take advantage of additional supervision signals to avoid negative transfer and expensive hyper-parameter tuning is proposed.

Abstract

Citation intention Classification (CIC) tools classify citations by their intention (e.g., background, motivation) and assist readers in evaluating the contribution of scientific literature. Prior research has shown that pretrained language models (PLMs) such as SciBERT can achieve state-of-the-art performance on CIC benchmarks. PLMs are trained via self-supervision tasks on a large corpus of general text and can quickly adapt to CIC tasks via moderate fine-tuning on the corresponding dataset. Despite their advantages, PLMs can easily overfit small datasets during fine-tuning. In this paper, we propose a multi-task learning (MTL) framework that jointly fine-tunes PLMs on a dataset of primary interest together with multiple auxiliary CIC datasets to take advantage of additional supervision signals. We develop a data-driven task relation learning (TRL) method that controls the contribution of auxiliary datasets to avoid negative transfer and expensive hyper-parameter tuning. We conduct experiments on three CIC datasets and show that fine-tuning with additional datasets can improve the PLMs' generalization performance on the primary dataset. PLMs fine-tuned with our proposed framework outperform the current state-of-the-art models by 7% to 11% on small datasets while aligning with the best-performing model on a large dataset.

Fine-Tuning Language Models on Multiple Datasets for Citation Intention Classification

TL;DR

A multi-task learning (MTL) framework that jointly fine-tunes PLMs on a dataset of primary interest together with multiple auxiliary CIC datasets to take advantage of additional supervision signals to avoid negative transfer and expensive hyper-parameter tuning is proposed.

Abstract

Citation intention Classification (CIC) tools classify citations by their intention (e.g., background, motivation) and assist readers in evaluating the contribution of scientific literature. Prior research has shown that pretrained language models (PLMs) such as SciBERT can achieve state-of-the-art performance on CIC benchmarks. PLMs are trained via self-supervision tasks on a large corpus of general text and can quickly adapt to CIC tasks via moderate fine-tuning on the corresponding dataset. Despite their advantages, PLMs can easily overfit small datasets during fine-tuning. In this paper, we propose a multi-task learning (MTL) framework that jointly fine-tunes PLMs on a dataset of primary interest together with multiple auxiliary CIC datasets to take advantage of additional supervision signals. We develop a data-driven task relation learning (TRL) method that controls the contribution of auxiliary datasets to avoid negative transfer and expensive hyper-parameter tuning. We conduct experiments on three CIC datasets and show that fine-tuning with additional datasets can improve the PLMs' generalization performance on the primary dataset. PLMs fine-tuned with our proposed framework outperform the current state-of-the-art models by 7% to 11% on small datasets while aligning with the best-performing model on a large dataset.

Paper Structure

This paper contains 35 sections, 6 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Overview of the architecture of our multi-task learning framework. (a) An overview of the MTL training process. The same language model parameters are shared across all datasets.; (b) The three readout operations (CLS, MEAN, CITED HERE) over the language model embeddings to generate representations for citation contexts; (c) The task relation learning (TRL) method. We train a classification model on an auxiliary dataset then evaluate its information gain on the primary dataset.
  • Figure 2: BERT performance of all binary combinations of primary and auxiliary datasets with different value of $\lambda$s. The yellow line denotes the baseline performance of fine-tuning on only the primary dataset. The blue line denotes the performance of fine-tuning the primary and the auxiliary dataset with different $\lambda$s. The star and the triangle indicate the $\lambda$ found by our TRL method and the grid search method, respectively. Time in the brackets indicates the GPU time needed for the method.
  • Figure 3: Performance (Macro-F1 with standard deviation) of the three readout functions: CLS, MEAN, and CITED HERE on BERT and SciBERT.
  • Figure 4: SciBERT performance of all binary combinations of primary and auxiliary datasets with different value of $\lambda$s. The yellow line denotes the baseline performance of fine-tuning on only the primary dataset. The blue line denotes the performance of fine-tuning the primary and the auxiliary dataset with different $\lambda$s. The star and the triangle indicate the $lambda$ found by our TRL method and the grid search method, respectively. Time in the brackets indicates the GPU time needed for the method.
  • Figure 5: T-SNE visualization of the citation contexts in different datasets. Citation contexts are encoded by SciBERT using the CLS readout function.