Attention-Seeker: Dynamic Self-Attention Scoring for Unsupervised Keyphrase Extraction

Erwin D. López Z.; Cheng Tang; Atsushi Shimada

Attention-Seeker: Dynamic Self-Attention Scoring for Unsupervised Keyphrase Extraction

Erwin D. López Z., Cheng Tang, Atsushi Shimada

TL;DR

Attention-Seeker introduces a parameter-free, unsupervised keyphrase extraction method that leverages Self-Attention Maps from an open LLM (LLAMA 3-8B) to automatically identify which SAMs and document segments are most relevant for keyphrase candidates. It reframes keyphrase extraction as selecting SAMs most aligned with a task-specific hypothesis vector, aggregating attention across layers/heads and, for long documents, across abstract-derived segments, to produce final candidate scores. The approach achieves state-of-the-art results on three of four public datasets and strong performance on long documents, demonstrating robustness without manual tuning. These findings suggest that learning to weigh internal SAMs and segment-level information can substantially improve unsupervised keyphrase extraction and may inform future analyses of LLM internal representations and feature selection.

Abstract

This paper proposes Attention-Seeker, an unsupervised keyphrase extraction method that leverages self-attention maps from a Large Language Model to estimate the importance of candidate phrases. Our approach identifies specific components - such as layers, heads, and attention vectors - where the model pays significant attention to the key topics of the text. The attention weights provided by these components are then used to score the candidate phrases. Unlike previous models that require manual tuning of parameters (e.g., selection of heads, prompts, hyperparameters), Attention-Seeker dynamically adapts to the input text without any manual adjustments, enhancing its practical applicability. We evaluate Attention-Seeker on four publicly available datasets: Inspec, SemEval2010, SemEval2017, and Krapivin. Our results demonstrate that, even without parameter tuning, Attention-Seeker outperforms most baseline models, achieving state-of-the-art performance on three out of four datasets, particularly excelling in extracting keyphrases from long documents.

Attention-Seeker: Dynamic Self-Attention Scoring for Unsupervised Keyphrase Extraction

TL;DR

Abstract

Paper Structure (26 sections, 15 equations, 6 figures, 8 tables)

This paper contains 26 sections, 15 equations, 6 figures, 8 tables.

Introduction
Related work
Unsupervised Keyphrase Extraction
Self-Attention Map
Methodology
Candidate Generation
Extraction of the Self-Attention Maps
SAMs' Relevance Scoring: Hypothesis Engineering
Attention Scores Estimation: Short Documents
Attention Scores Estimation: Long Documents
Candidate Final Score Calculation
Experiments and Results
Datasets and Evaluation Metrics
Baselines
Results
...and 11 more sections

Figures (6)

Figure 1: The effect of different hypothesis vectors H in an LLM's distribution of attention scores.
Figure 2: The core architecture of Attention-Seeker for short documents.
Figure 3: The core architecture of Attention-Seeker for long documents.
Figure 4: Relevance of LLAMA 3-8B layers for keyphrase extraction estimated by Attention-Seeker over four datasets. (The lighter the color, the higher the relevance rank of the corresponding layer)
Figure 5: Relevance of LLAMA 3-8B heads for keyphrase extraction estimated by Attention-Seeker in one sample of the Inspec dataset.
...and 1 more figures

Attention-Seeker: Dynamic Self-Attention Scoring for Unsupervised Keyphrase Extraction

TL;DR

Abstract

Attention-Seeker: Dynamic Self-Attention Scoring for Unsupervised Keyphrase Extraction

Authors

TL;DR

Abstract

Table of Contents

Figures (6)