Embedded Topic Models Enhanced by Wikification

Takashi Shibuya; Takehito Utsuro

Embedded Topic Models Enhanced by Wikification

Takashi Shibuya, Takehito Utsuro

TL;DR

This work tackles word homography in topic modeling by injecting entity knowledge from Wikipedia into neural topic models. By combining wikification-based entity linking with Wikipedia2Vec embeddings, the approach feeds ETM and Dynamic ETM with both word and entity representations, enabling disambiguation of homographs like apple and amazon and enriching topic interpretability. Empirical results on NYT and AIDA-CoNLL show improved generalization (perplexity) and sensible temporal topic dynamics, with qualitative topic-transition visualizations highlighting increased interpretability through entity mentions. The method demonstrates potential for more accurate, entity-aware topic analyses in corpora with ambiguous terms, while highlighting the dependence on high-quality entity linking and embedding biases.

Abstract

Topic modeling analyzes a collection of documents to learn meaningful patterns of words. However, previous topic models consider only the spelling of words and do not take into consideration the homography of words. In this study, we incorporate the Wikipedia knowledge into a neural topic model to make it aware of named entities. We evaluate our method on two datasets, 1) news articles of \textit{New York Times} and 2) the AIDA-CoNLL dataset. Our experiments show that our method improves the performance of neural topic models in generalizability. Moreover, we analyze frequent terms in each topic and the temporal dependencies between topics to demonstrate that our entity-aware topic models can capture the time-series development of topics well.

Embedded Topic Models Enhanced by Wikification

TL;DR

Abstract

Paper Structure (21 sections, 3 figures, 3 tables)

This paper contains 21 sections, 3 figures, 3 tables.

Introduction
Related Work
Neural Topic Models
Dynamic Topic Models
Entity Embeddings
Topic Models with Wikipedia
Topic Models
Latent Dirichlet Allocation (LDA)
Embedded Topic Model (ETM)
Dynamic Embedded Topic Model (D-ETM)
Proposed Method
Incorporation of Entity Linking
Experiments
Fine-Grained Topic Modeling
Experimental Setup
...and 6 more sections

Figures (3)

Figure 1: Processing flows of conventional topic models and our proposed topic model.
Figure 2: Difference between conventional embedded topic models and our proposed topic model.
Figure 3: Examples of topic transition. We present the top five most frequent terms in each topic.

Embedded Topic Models Enhanced by Wikification

TL;DR

Abstract

Embedded Topic Models Enhanced by Wikification

Authors

TL;DR

Abstract

Table of Contents

Figures (3)