On Leveraging Encoder-only Pre-trained Language Models for Effective Keyphrase Generation
Di Wu, Wasi Uddin Ahmad, Kai-Wei Chang
TL;DR
The paper tackles keyphrase generation (KPG) with encoder-only PLMs, addressing their efficacy, architectural choices, and cross-domain performance relative to encoder-decoder models. It introduces two main formulations for utilizing encoder-only PLMs: sequence labeling for present keyphrases and prefix-LM based seq2seq generation, plus a BERT2BERT initialization approach, evaluated on KP20k (Science) and KPTimes (News). Key findings show that encoder-only models can generate broader keyphrase sets, prefix-LM fine-tuning is data-efficient and competitive with in-domain seq2seq PLMs, and that for encoder-decoder setups, deeper encoders with shallower decoders yield better keyphrase quality and latency. The work provides practical guidance for deploying encoder-only KPG systems, demonstrates domain transfer patterns (SciBERT transfers to News but not vice versa), and releases code and checkpoints to foster further research.
Abstract
This study addresses the application of encoder-only Pre-trained Language Models (PLMs) in keyphrase generation (KPG) amidst the broader availability of domain-tailored encoder-only models compared to encoder-decoder models. We investigate three core inquiries: (1) the efficacy of encoder-only PLMs in KPG, (2) optimal architectural decisions for employing encoder-only PLMs in KPG, and (3) a performance comparison between in-domain encoder-only and encoder-decoder PLMs across varied resource settings. Our findings, derived from extensive experimentation in two domains reveal that with encoder-only PLMs, although KPE with Conditional Random Fields slightly excels in identifying present keyphrases, the KPG formulation renders a broader spectrum of keyphrase predictions. Additionally, prefix-LM fine-tuning of encoder-only PLMs emerges as a strong and data-efficient strategy for KPG, outperforming general-domain seq2seq PLMs. We also identify a favorable parameter allocation towards model depth rather than width when employing encoder-decoder architectures initialized with encoder-only PLMs. The study sheds light on the potential of utilizing encoder-only PLMs for advancing KPG systems and provides a groundwork for future KPG methods. Our code and pre-trained checkpoints are released at https://github.com/uclanlp/DeepKPG.
