Pre-trained Language Models for Keyphrase Generation: A Thorough Empirical Study

Di Wu; Wasi Uddin Ahmad; Kai-Wei Chang

Pre-trained Language Models for Keyphrase Generation: A Thorough Empirical Study

Di Wu, Wasi Uddin Ahmad, Kai-Wei Chang

TL;DR

The paper addresses how pre-trained language models compare to non-PLM baselines for keyphrase extraction and generation, formulating extraction as sequence labeling and generation as seq2seq. It conducts a comprehensive empirical study across science, news, and online forums, evaluating encoder-decoder, encoder-only, and domain-specific PLMs under various design choices, including in-domain pre-training and parameter budgets. Key findings show that large or in-domain seq2seq PLMs approach state-of-the-art performance, while in-domain encoder-only PLMs offer strong, data-efficient results; deeper encoders with shallower decoders are generally preferred under fixed budgets, and task formulation (extraction vs generation) interacts with model choice. The work provides concrete guidance for practitioners and contributes pre-trained domain-specific models (SciBART, NewsBART, NewsBERT) to advance keyphrase generation in scientific and real-world settings. It further demonstrates that task-specific pre-training does not fully replace in-domain adaptation, underscoring the importance of domain-aware fine-tuning for reliable keyphrase generation.

Abstract

Neural models that do not rely on pre-training have excelled in the keyphrase generation task with large annotated datasets. Meanwhile, new approaches have incorporated pre-trained language models (PLMs) for their data efficiency. However, there lacks a systematic study of how the two types of approaches compare and how different design choices can affect the performance of PLM-based models. To fill in this knowledge gap and facilitate a more informed use of PLMs for keyphrase extraction and keyphrase generation, we present an in-depth empirical study. Formulating keyphrase extraction as sequence labeling and keyphrase generation as sequence-to-sequence generation, we perform extensive experiments in three domains. After showing that PLMs have competitive high-resource performance and state-of-the-art low-resource performance, we investigate important design choices including in-domain PLMs, PLMs with different pre-training objectives, using PLMs with a parameter budget, and different formulations for present keyphrases. Further results show that (1) in-domain BERT-like PLMs can be used to build strong and data-efficient keyphrase generation models; (2) with a fixed parameter budget, prioritizing model depth over width and allocating more layers in the encoder leads to better encoder-decoder models; and (3) introducing four in-domain PLMs, we achieve a competitive performance in the news domain and the state-of-the-art performance in the scientific domain.

Pre-trained Language Models for Keyphrase Generation: A Thorough Empirical Study

TL;DR

Abstract

Paper Structure (61 sections, 8 figures, 14 tables)

This paper contains 61 sections, 8 figures, 14 tables.

Introduction
Methods
Problem Definition
Keyphrase Extraction
Keyphrase Generation
Encoder-Decoder PLMs
Encoder-only PLMs
BERT2BERT
Mask Manipulation
Domain-specific PLMs
Scientific PLMs
News PLMs
Experimental Setup
Benchmarks
SciKP
...and 46 more sections

Figures (8)

Figure 1: An example news article with present and absent keyphrases highlighted in blue and red respectively. For better readability, the document is not tokenized.
Figure 2: Present keyphrase generation performance of different methods as a function of training set size.
Figure 3: Keyphrase generation performance of PLMs pre-trained in different domains. Each table shows the performance of BERT-G, SciBERT-G, and NewsBERT-G in the first row and the performance of BART, SciBART, and NewsBART in the second row.
Figure 4: Domain-specific encoder-only PLMs are available in a variety of domains. No prior work considered using these "domain experts" for keyphrase generation.
Figure 5: Domain distribution of the S2ORC dataset.
...and 3 more figures

Pre-trained Language Models for Keyphrase Generation: A Thorough Empirical Study

TL;DR

Abstract

Pre-trained Language Models for Keyphrase Generation: A Thorough Empirical Study

Authors

TL;DR

Abstract

Table of Contents

Figures (8)