Enhancing Phrase Representation by Information Bottleneck Guided Text Diffusion Process for Keyphrase Extraction

Yuanzhen Luo; Qingyu Zhou; Feng Zhou

Enhancing Phrase Representation by Information Bottleneck Guided Text Diffusion Process for Keyphrase Extraction

Yuanzhen Luo, Qingyu Zhou, Feng Zhou

TL;DR

Diff-KPE is proposed, which leverages the supervised Variational Information Bottleneck (VIB) to guide the text diffusion process for generating enhanced keyphrase representations and outperforms existing KPE methods on a large open domain keyphrase extraction benchmark, OpenKP, and a scientific domain dataset, KP20K.

Abstract

Keyphrase extraction (KPE) is an important task in Natural Language Processing for many scenarios, which aims to extract keyphrases that are present in a given document. Many existing supervised methods treat KPE as sequential labeling, span-level classification, or generative tasks. However, these methods lack the ability to utilize keyphrase information, which may result in biased results. In this study, we propose Diff-KPE, which leverages the supervised Variational Information Bottleneck (VIB) to guide the text diffusion process for generating enhanced keyphrase representations. Diff-KPE first generates the desired keyphrase embeddings conditioned on the entire document and then injects the generated keyphrase embeddings into each phrase representation. A ranking network and VIB are then optimized together with rank loss and classification loss, respectively. This design of Diff-KPE allows us to rank each candidate phrase by utilizing both the information of keyphrases and the document. Experiments show that Diff-KPE outperforms existing KPE methods on a large open domain keyphrase extraction benchmark, OpenKP, and a scientific domain dataset, KP20K.

Enhancing Phrase Representation by Information Bottleneck Guided Text Diffusion Process for Keyphrase Extraction

TL;DR

Abstract

Paper Structure (24 sections, 13 equations, 2 figures, 5 tables)

This paper contains 24 sections, 13 equations, 2 figures, 5 tables.

Introduction
Related Work
Keyphrase Extraction
Diffusion Models for Text
Variational Information Bottleneck in NLP
Methodology
Phrase Representation
Keyphrase Embeddings Generation
Input Encoding
Diffusion Generation Process
Keyphrase Ranking
Keyphrase Classification
Optimization and Inference
Experiments
Datasets
...and 9 more sections

Figures (2)

Figure 1: Diff-KPE is jointly trained with a continuous diffusion module, a variational information bottleneck, and a rank network. The black dashed box is the diffusion module, the blue dashed box is the VIB module and the purple dashed box is the rank network.
Figure 2: T-SNE visualization of phrase embeddings from OpenKP dataset.

Enhancing Phrase Representation by Information Bottleneck Guided Text Diffusion Process for Keyphrase Extraction

TL;DR

Abstract

Enhancing Phrase Representation by Information Bottleneck Guided Text Diffusion Process for Keyphrase Extraction

Authors

TL;DR

Abstract

Table of Contents

Figures (2)