Unsupervised Domain Adaptation for Keyphrase Generation using Citation Contexts

Florian Boudin; Akiko Aizawa

Unsupervised Domain Adaptation for Keyphrase Generation using Citation Contexts

Florian Boudin, Akiko Aizawa

Abstract

Adapting keyphrase generation models to new domains typically involves few-shot fine-tuning with in-domain labeled data. However, annotating documents with keyphrases is often prohibitively expensive and impractical, requiring expert annotators. This paper presents silk, an unsupervised method designed to address this issue by extracting silver-standard keyphrases from citation contexts to create synthetic labeled data for domain adaptation. Extensive experiments across three distinct domains demonstrate that our method yields high-quality synthetic samples, resulting in significant and consistent improvements in in-domain performance over strong baselines.

Unsupervised Domain Adaptation for Keyphrase Generation using Citation Contexts

Abstract

Paper Structure (24 sections, 1 equation, 2 figures, 14 tables)

This paper contains 24 sections, 1 equation, 2 figures, 14 tables.

Introduction
Method
Datasets
Natural Language Processing (nlp)
Astrophysics (astro)
Paleontology (paleo)
Statistics and Analysis
Experimental Settings
Initial Model
Domain Adaptation
Baselines
Datasets and Evaluation Metrics
Performance of Models on KP20k
Results
Confidence Ranking of synthetic samples
...and 9 more sections

Figures (2)

Figure 1: Illustration of the silk method for mining silver-standard keyphrases (highlighted in red) from citation contexts and generating synthetic samples for adapting models to new domains.
Figure 2: t-SNE 2-D projections of the gold keyphrases from $\bullet$ KP20k, $\bullet$nlp, $\bullet$astro and $\bullet$paleo. We leverage SPECTER to compute the keyphrase embeddings and use the first 500 documents from KP20k for clarity.

Unsupervised Domain Adaptation for Keyphrase Generation using Citation Contexts

Abstract

Unsupervised Domain Adaptation for Keyphrase Generation using Citation Contexts

Authors

Abstract

Table of Contents

Figures (2)