Labeled Interactive Topic Models

Kyle Seelman; Mozhi Zhang; Jordan Boyd-Graber

Labeled Interactive Topic Models

Kyle Seelman, Mozhi Zhang, Jordan Boyd-Graber

TL;DR

This work addresses the gap where neural topic models lack intuitive user-driven guidance. It introduces Interactive Neural Topic Modeling (i-NTM), enabling label-based updates to topic embeddings via two mechanisms: learning-adjustable embeddings during training and post-training adjustments that reweight topic-word distributions with a label-aware formulation. The authors provide a user interface, automatic metrics, and a human study demonstrating improved retrieval of relevant documents when users label topics, validating practical value in time-sensitive information needs. The approach enhances interpretability and task relevance of neural topics and opens avenues to integrate user feedback with neural representations, potentially extending to LLM-based topic modeling. $\alpha_k^{new} = \lambda( w_k - \alpha_k^{old}) + (1-\lambda)\alpha_k^{old}$ illustrates the core embedding-shift idea behind label-driven topic refinement.

Abstract

Topic models are valuable for understanding extensive document collections, but they don't always identify the most relevant topics. Classical probabilistic and anchor-based topic models offer interactive versions that allow users to guide the models towards more pertinent topics. However, such interactive features have been lacking in neural topic models. To correct this lacuna, we introduce a user-friendly interaction for neural topic models. This interaction permits users to assign a word label to a topic, leading to an update in the topic model where the words in the topic become closely aligned with the given label. Our approach encompasses two distinct kinds of neural topic models. The first includes models where topic embeddings are trainable and evolve during the training process. The second kind involves models where topic embeddings are integrated post-training, offering a different approach to topic refinement. To facilitate user interaction with these neural topic models, we have developed an interactive interface. This interface enables users to engage with and re-label topics as desired. We evaluate our method through a human study, where users can relabel topics to find relevant documents. Using our method, user labeling improves document rank scores, helping to find more relevant documents to a given query when compared to no user labeling.

Labeled Interactive Topic Models

TL;DR

illustrates the core embedding-shift idea behind label-driven topic refinement.

Abstract

Paper Structure (22 sections, 4 equations, 4 figures, 3 tables)

This paper contains 22 sections, 4 equations, 4 figures, 3 tables.

Topic Models Need Help
The Best of Both Worlds: Neural Word Knowledge and Bayesian Informative Priors
Latent Dirichlet Allocation
Neural Topic Models
Interactive Neural Topic Modeling
Adjusting Learnable Topic Embeddings
Adding Adjustable Topic Embeddings After Training
User Interface
Automatic Metrics
Human Study
I-NTM Experimental Results
Labeling Improves Coherence
Human Study
Related Work
Neural topic models
...and 7 more sections

Figures (4)

Figure 1: Visual representation labeling a new topic with out method, like in Table \ref{['tab:example']}. Our method moves the embedding center for the topic closer to the new label word, in this case, India.
Figure 2: Human study interface for i-ntm, using ctm as the neural model. Users can see the given topics that are found for a set of tasks/requests and can change the label to better fit their needs. Additionally, the assigned documents for each topic are shown and users can select which documents are most relevant.
Figure 3: Labeling topics leads to, otherwise missed, documents to be revealed. The maximum number of new documents, that is, a document that was not previously associated with the topic, found for each question across all users. A single labeling of a topic can lead to a large number of new documents to be revealed.
Figure 4: Average BM25 document ranking scores for each of the 5 questions averaged, over the 20 users. User inputted topic labels find more relevant documents and significantly improve document ranking scores

Labeled Interactive Topic Models

TL;DR

Abstract

Labeled Interactive Topic Models

Authors

TL;DR

Abstract

Table of Contents

Figures (4)