Zero-Shot Keyphrase Generation: Investigating Specialized Instructions and Multi-Sample Aggregation on Large Language Models
Jayanth Mohan, Jishnu Ray Chowdhury, Tomas Malik, Cornelia Caragea
TL;DR
This work interrogates zero-shot keyphrase generation by evaluating open-source instruction-tuned LLMs (Llama-3, Phi-3) and GPT-4o on five datasets with three research questions. It systematically tests specialist prompts, instruction modifiers, and multi-sample aggregation, introducing aggregation strategies like Frequency Order and dynamic keyphrase counts. The key finding is that specialist prompts and detailed instructions offer limited, inconsistent gains, whereas multi-sampling with carefully chosen aggregation substantially boosts performance, particularly for absent keyphrases. Overall, the approach yields competitive results against prior work in zero-shot settings, with domain effects evident across scientific and news datasets, highlighting the practical value of aggregation in LLM-based KPG.
Abstract
Keyphrases are the essential topical phrases that summarize a document. Keyphrase generation is a long-standing NLP task for automatically generating keyphrases for a given document. While the task has been comprehensively explored in the past via various models, only a few works perform some preliminary analysis of Large Language Models (LLMs) for the task. Given the impact of LLMs in the field of NLP, it is important to conduct a more thorough examination of their potential for keyphrase generation. In this paper, we attempt to meet this demand with our research agenda. Specifically, we focus on the zero-shot capabilities of open-source instruction-tuned LLMs (Phi-3, Llama-3) and the closed-source GPT-4o for this task. We systematically investigate the effect of providing task-relevant specialized instructions in the prompt. Moreover, we design task-specific counterparts to self-consistency-style strategies for LLMs and show significant benefits from our proposals over the baselines.
