Large Language Models as Zero-Shot Keyphrase Extractors: A Preliminary Empirical Study

Mingyang Song; Xuelian Geng; Songfang Yao; Shilong Lu; Yi Feng; Liping Jing

Large Language Models as Zero-Shot Keyphrase Extractors: A Preliminary Empirical Study

Mingyang Song, Xuelian Geng, Songfang Yao, Shilong Lu, Yi Feng, Liping Jing

TL;DR

The paper investigates zero-shot keyphrase extraction by prompting ChatGPT without any training data. It benchmarks ChatGPT against extensive unsupervised and supervised baselines on four datasets (Inspec, DUC2001, SemEval2010, OpenKP) and analyzes long-document understanding, using metrics such as $F1@5$ and $F1@M$. The findings show ChatGPT's zero-shot keyphrase extraction is competitive with simple baselines like $TF$-$IDF$ but generally lags behind state-of-the-art supervised methods HyperMatch and KIEMP, with additional gains possible through prompt engineering and fine-tuning. The study also reveals limitations in handling long documents, suggesting that longer inputs or architectures better suited for long-range context could improve performance. Overall, the work highlights both the potential and current limits of zero-shot, prompt-based LLM approaches for keyphrase extraction.

Abstract

Zero-shot keyphrase extraction aims to build a keyphrase extractor without training by human-annotated data, which is challenging due to the limited human intervention involved. Challenging but worthwhile, zero-shot setting efficiently reduces the time and effort that data labeling takes. Recent efforts on pre-trained large language models (e.g., ChatGPT and ChatGLM) show promising performance on zero-shot settings, thus inspiring us to explore prompt-based methods. In this paper, we ask whether strong keyphrase extraction models can be constructed by directly prompting the large language model ChatGPT. Through experimental results, it is found that ChatGPT still has a lot of room for improvement in the keyphrase extraction task compared to existing state-of-the-art unsupervised and supervised models.

Large Language Models as Zero-Shot Keyphrase Extractors: A Preliminary Empirical Study

TL;DR

and

. The findings show ChatGPT's zero-shot keyphrase extraction is competitive with simple baselines like

but generally lags behind state-of-the-art supervised methods HyperMatch and KIEMP, with additional gains possible through prompt engineering and fine-tuning. The study also reveals limitations in handling long documents, suggesting that longer inputs or architectures better suited for long-range context could improve performance. Overall, the work highlights both the potential and current limits of zero-shot, prompt-based LLM approaches for keyphrase extraction.

Abstract

Paper Structure (7 sections, 4 tables)

This paper contains 7 sections, 4 tables.

Introduction
ChatGPT for Keyphrase Extraction
Evaluation Setting
Keyphrase Extraction Prompts
Overall Performance
Long Document Understanding
Conclusion

Large Language Models as Zero-Shot Keyphrase Extractors: A Preliminary Empirical Study

TL;DR

Abstract

Large Language Models as Zero-Shot Keyphrase Extractors: A Preliminary Empirical Study

Authors

TL;DR

Abstract

Table of Contents