Table of Contents
Fetching ...

On Leveraging Encoder-only Pre-trained Language Models for Effective Keyphrase Generation

Di Wu, Wasi Uddin Ahmad, Kai-Wei Chang

TL;DR

The paper tackles keyphrase generation (KPG) with encoder-only PLMs, addressing their efficacy, architectural choices, and cross-domain performance relative to encoder-decoder models. It introduces two main formulations for utilizing encoder-only PLMs: sequence labeling for present keyphrases and prefix-LM based seq2seq generation, plus a BERT2BERT initialization approach, evaluated on KP20k (Science) and KPTimes (News). Key findings show that encoder-only models can generate broader keyphrase sets, prefix-LM fine-tuning is data-efficient and competitive with in-domain seq2seq PLMs, and that for encoder-decoder setups, deeper encoders with shallower decoders yield better keyphrase quality and latency. The work provides practical guidance for deploying encoder-only KPG systems, demonstrates domain transfer patterns (SciBERT transfers to News but not vice versa), and releases code and checkpoints to foster further research.

Abstract

This study addresses the application of encoder-only Pre-trained Language Models (PLMs) in keyphrase generation (KPG) amidst the broader availability of domain-tailored encoder-only models compared to encoder-decoder models. We investigate three core inquiries: (1) the efficacy of encoder-only PLMs in KPG, (2) optimal architectural decisions for employing encoder-only PLMs in KPG, and (3) a performance comparison between in-domain encoder-only and encoder-decoder PLMs across varied resource settings. Our findings, derived from extensive experimentation in two domains reveal that with encoder-only PLMs, although KPE with Conditional Random Fields slightly excels in identifying present keyphrases, the KPG formulation renders a broader spectrum of keyphrase predictions. Additionally, prefix-LM fine-tuning of encoder-only PLMs emerges as a strong and data-efficient strategy for KPG, outperforming general-domain seq2seq PLMs. We also identify a favorable parameter allocation towards model depth rather than width when employing encoder-decoder architectures initialized with encoder-only PLMs. The study sheds light on the potential of utilizing encoder-only PLMs for advancing KPG systems and provides a groundwork for future KPG methods. Our code and pre-trained checkpoints are released at https://github.com/uclanlp/DeepKPG.

On Leveraging Encoder-only Pre-trained Language Models for Effective Keyphrase Generation

TL;DR

The paper tackles keyphrase generation (KPG) with encoder-only PLMs, addressing their efficacy, architectural choices, and cross-domain performance relative to encoder-decoder models. It introduces two main formulations for utilizing encoder-only PLMs: sequence labeling for present keyphrases and prefix-LM based seq2seq generation, plus a BERT2BERT initialization approach, evaluated on KP20k (Science) and KPTimes (News). Key findings show that encoder-only models can generate broader keyphrase sets, prefix-LM fine-tuning is data-efficient and competitive with in-domain seq2seq PLMs, and that for encoder-decoder setups, deeper encoders with shallower decoders yield better keyphrase quality and latency. The work provides practical guidance for deploying encoder-only KPG systems, demonstrates domain transfer patterns (SciBERT transfers to News but not vice versa), and releases code and checkpoints to foster further research.

Abstract

This study addresses the application of encoder-only Pre-trained Language Models (PLMs) in keyphrase generation (KPG) amidst the broader availability of domain-tailored encoder-only models compared to encoder-decoder models. We investigate three core inquiries: (1) the efficacy of encoder-only PLMs in KPG, (2) optimal architectural decisions for employing encoder-only PLMs in KPG, and (3) a performance comparison between in-domain encoder-only and encoder-decoder PLMs across varied resource settings. Our findings, derived from extensive experimentation in two domains reveal that with encoder-only PLMs, although KPE with Conditional Random Fields slightly excels in identifying present keyphrases, the KPG formulation renders a broader spectrum of keyphrase predictions. Additionally, prefix-LM fine-tuning of encoder-only PLMs emerges as a strong and data-efficient strategy for KPG, outperforming general-domain seq2seq PLMs. We also identify a favorable parameter allocation towards model depth rather than width when employing encoder-decoder architectures initialized with encoder-only PLMs. The study sheds light on the potential of utilizing encoder-only PLMs for advancing KPG systems and provides a groundwork for future KPG methods. Our code and pre-trained checkpoints are released at https://github.com/uclanlp/DeepKPG.
Paper Structure (38 sections, 4 figures, 6 tables)

This paper contains 38 sections, 4 figures, 6 tables.

Figures (4)

  • Figure 1: An example of a news article with its present and absent keyphrases highlighted in blue and red, respectively.
  • Figure 2: Domain-specific encoder-only PLMs are available in a variety of domains. No prior work considered using these "domain experts" for KPG. In this paper, we show that these specialized encoder-only PLMs can be used to build strong and resource-efficient KPG models.
  • Figure 3: Present keyphrase generation performance of different methods as a function of training set size. Fine-tuning in-domain PLMs is much more data-efficient than the other approaches.
  • Figure 4: Inference speed of BERT2BERT models with different encoder-decoder configurations on GPU and CPU. All the data points are obtained with BERT2BERT models with 12 layers. A model with x decoder layers has 12-x encoder layers.