Table of Contents
Fetching ...

Zero-Shot Keyphrase Generation: Investigating Specialized Instructions and Multi-Sample Aggregation on Large Language Models

Jayanth Mohan, Jishnu Ray Chowdhury, Tomas Malik, Cornelia Caragea

TL;DR

This work interrogates zero-shot keyphrase generation by evaluating open-source instruction-tuned LLMs (Llama-3, Phi-3) and GPT-4o on five datasets with three research questions. It systematically tests specialist prompts, instruction modifiers, and multi-sample aggregation, introducing aggregation strategies like Frequency Order and dynamic keyphrase counts. The key finding is that specialist prompts and detailed instructions offer limited, inconsistent gains, whereas multi-sampling with carefully chosen aggregation substantially boosts performance, particularly for absent keyphrases. Overall, the approach yields competitive results against prior work in zero-shot settings, with domain effects evident across scientific and news datasets, highlighting the practical value of aggregation in LLM-based KPG.

Abstract

Keyphrases are the essential topical phrases that summarize a document. Keyphrase generation is a long-standing NLP task for automatically generating keyphrases for a given document. While the task has been comprehensively explored in the past via various models, only a few works perform some preliminary analysis of Large Language Models (LLMs) for the task. Given the impact of LLMs in the field of NLP, it is important to conduct a more thorough examination of their potential for keyphrase generation. In this paper, we attempt to meet this demand with our research agenda. Specifically, we focus on the zero-shot capabilities of open-source instruction-tuned LLMs (Phi-3, Llama-3) and the closed-source GPT-4o for this task. We systematically investigate the effect of providing task-relevant specialized instructions in the prompt. Moreover, we design task-specific counterparts to self-consistency-style strategies for LLMs and show significant benefits from our proposals over the baselines.

Zero-Shot Keyphrase Generation: Investigating Specialized Instructions and Multi-Sample Aggregation on Large Language Models

TL;DR

This work interrogates zero-shot keyphrase generation by evaluating open-source instruction-tuned LLMs (Llama-3, Phi-3) and GPT-4o on five datasets with three research questions. It systematically tests specialist prompts, instruction modifiers, and multi-sample aggregation, introducing aggregation strategies like Frequency Order and dynamic keyphrase counts. The key finding is that specialist prompts and detailed instructions offer limited, inconsistent gains, whereas multi-sampling with carefully chosen aggregation substantially boosts performance, particularly for absent keyphrases. Overall, the approach yields competitive results against prior work in zero-shot settings, with domain effects evident across scientific and news datasets, highlighting the practical value of aggregation in LLM-based KPG.

Abstract

Keyphrases are the essential topical phrases that summarize a document. Keyphrase generation is a long-standing NLP task for automatically generating keyphrases for a given document. While the task has been comprehensively explored in the past via various models, only a few works perform some preliminary analysis of Large Language Models (LLMs) for the task. Given the impact of LLMs in the field of NLP, it is important to conduct a more thorough examination of their potential for keyphrase generation. In this paper, we attempt to meet this demand with our research agenda. Specifically, we focus on the zero-shot capabilities of open-source instruction-tuned LLMs (Phi-3, Llama-3) and the closed-source GPT-4o for this task. We systematically investigate the effect of providing task-relevant specialized instructions in the prompt. Moreover, we design task-specific counterparts to self-consistency-style strategies for LLMs and show significant benefits from our proposals over the baselines.

Paper Structure

This paper contains 19 sections, 2 equations, 3 figures, 10 tables.

Figures (3)

  • Figure 1: Baseline template used for Llama-3 and Phi-3.
  • Figure 2: Instructions used for Order Control and Length Control. Note that the main values for the instruction variable are in the blue bordered box. The differences of box sizes and colours are for visualization only and do not play any role in the actual prompt.
  • Figure 3: Visualization of Union Interleaf aggregation over multiple samples.