Large Language Models as Zero-Shot Keyphrase Extractors: A Preliminary Empirical Study
Mingyang Song, Xuelian Geng, Songfang Yao, Shilong Lu, Yi Feng, Liping Jing
TL;DR
The paper investigates zero-shot keyphrase extraction by prompting ChatGPT without any training data. It benchmarks ChatGPT against extensive unsupervised and supervised baselines on four datasets (Inspec, DUC2001, SemEval2010, OpenKP) and analyzes long-document understanding, using metrics such as $F1@5$ and $F1@M$. The findings show ChatGPT's zero-shot keyphrase extraction is competitive with simple baselines like $TF$-$IDF$ but generally lags behind state-of-the-art supervised methods HyperMatch and KIEMP, with additional gains possible through prompt engineering and fine-tuning. The study also reveals limitations in handling long documents, suggesting that longer inputs or architectures better suited for long-range context could improve performance. Overall, the work highlights both the potential and current limits of zero-shot, prompt-based LLM approaches for keyphrase extraction.
Abstract
Zero-shot keyphrase extraction aims to build a keyphrase extractor without training by human-annotated data, which is challenging due to the limited human intervention involved. Challenging but worthwhile, zero-shot setting efficiently reduces the time and effort that data labeling takes. Recent efforts on pre-trained large language models (e.g., ChatGPT and ChatGLM) show promising performance on zero-shot settings, thus inspiring us to explore prompt-based methods. In this paper, we ask whether strong keyphrase extraction models can be constructed by directly prompting the large language model ChatGPT. Through experimental results, it is found that ChatGPT still has a lot of room for improvement in the keyphrase extraction task compared to existing state-of-the-art unsupervised and supervised models.
