NeuroMax: Enhancing Neural Topic Modeling via Maximizing Mutual Information and Group Topic Regularization

Duy-Tung Pham; Thien Trang Nguyen Vu; Tung Nguyen; Linh Ngo Van; Duc Anh Nguyen; Thien Huu Nguyen

NeuroMax: Enhancing Neural Topic Modeling via Maximizing Mutual Information and Group Topic Regularization

Duy-Tung Pham, Thien Trang Nguyen Vu, Tung Nguyen, Linh Ngo Van, Duc Anh Nguyen, Thien Huu Nguyen

TL;DR

This work proposes a novel framework called NeuroMax (Neural Topic Model with Maximizing Mutual Information with Pretrained Language Model and Group Topic Regularization), which maximizes the mutual information between the topic representation obtained from the encoder in neural topic models and the representation derived from the PLM.

Abstract

Recent advances in neural topic models have concentrated on two primary directions: the integration of the inference network (encoder) with a pre-trained language model (PLM) and the modeling of the relationship between words and topics in the generative model (decoder). However, the use of large PLMs significantly increases inference costs, making them less practical for situations requiring low inference times. Furthermore, it is crucial to simultaneously model the relationships between topics and words as well as the interrelationships among topics themselves. In this work, we propose a novel framework called NeuroMax (Neural Topic Model with Maximizing Mutual Information with Pretrained Language Model and Group Topic Regularization) to address these challenges. NeuroMax maximizes the mutual information between the topic representation obtained from the encoder in neural topic models and the representation derived from the PLM. Additionally, NeuroMax employs optimal transport to learn the relationships between topics by analyzing how information is transported among them. Experimental results indicate that NeuroMax reduces inference time, generates more coherent topics and topic groups, and produces more representative document embeddings, thereby enhancing performance on downstream tasks.

NeuroMax: Enhancing Neural Topic Modeling via Maximizing Mutual Information and Group Topic Regularization

TL;DR

Abstract

Paper Structure (21 sections, 14 equations, 2 figures, 8 tables)

This paper contains 21 sections, 14 equations, 2 figures, 8 tables.

Introduction
Related Work
Background
Proposed Method
Maximize Mutual Information with Pretrained Language Model
Group Topic Regularization
Overall objective function
Experiments
Settings
Topic Quality and Doc-Topic Distribution Quality
Ablation Study
Inference time
Conclustion
Appendix
Algorithm
...and 6 more sections

Figures (2)

Figure 1: High-level architecture of our encoder. Dashed line represent the part of our model that could be excluded in inference time.
Figure 2: (Left) t-SNE visualization of topics embeddings (black dots) and embeddings of their top 10 word (color dots). Word embeddings for topics within the same group share the same color. Pairs of topics with high information sharing scores are highlighted in gray. (Right) Corresponding top 10 words for each topic.

NeuroMax: Enhancing Neural Topic Modeling via Maximizing Mutual Information and Group Topic Regularization

TL;DR

Abstract

NeuroMax: Enhancing Neural Topic Modeling via Maximizing Mutual Information and Group Topic Regularization

Authors

TL;DR

Abstract

Table of Contents

Figures (2)