On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling

Xiaobao Wu; Fengjun Pan; Thong Nguyen; Yichao Feng; Chaoqun Liu; Cong-Duy Nguyen; Anh Tuan Luu

On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling

Xiaobao Wu, Fengjun Pan, Thong Nguyen, Yichao Feng, Chaoqun Liu, Cong-Duy Nguyen, Anh Tuan Luu

TL;DR

TraCo addresses three core issues in hierarchical topic modeling—low affinity between child and parent topics, low rationality of topic granularity across levels, and low diversity among siblings—by introducing Transport Plan Dependency (TPD) and a Context-aware Disentangled Decoder (CDD). TPD casts interlevel topic dependencies as entropic-regularized optimal transport plans, enforcing sparsity and balance via Sinkhorn optimization with cost $C^{( ext{ell})}_{kk'} = orm{ oldsymbol{t}^{( ext{ell+1})}_{k} - oldsymbol{t}^{( ext{ell})}_{k'} }^{2}$. CDD decouples decoding by level and injects a contextual topical bias $oldsymbol{b}^{( ext{ell})}$ derived from neighboring levels, promoting different semantic granularity across levels and improving rationality. The overall objective combines the TPD regularization and a VAEs-style topic-modeling loss: $ ext{min}_{ ext{params}} rac{ ext{TPD}}{L-1} + rac{1}{N} ext{TM}$, where the TM term includes per-level reconstructions with bias and a KL term for the latent r. Empirical results on NeurIPS, ACL, NYT, WikiText-103, and 20NG show TraCo consistently outperforms baselines in topic quality, hierarchical affinity/rationality/diversity, and downstream tasks, demonstrating more interpretable and task-effective topic hierarchies.

Abstract

Hierarchical topic modeling aims to discover latent topics from a corpus and organize them into a hierarchy to understand documents with desirable semantic granularity. However, existing work struggles with producing topic hierarchies of low affinity, rationality, and diversity, which hampers document understanding. To overcome these challenges, we in this paper propose Transport Plan and Context-aware Hierarchical Topic Model (TraCo). Instead of early simple topic dependencies, we propose a transport plan dependency method. It constrains dependencies to ensure their sparsity and balance, and also regularizes topic hierarchy building with them. This improves affinity and diversity of hierarchies. We further propose a context-aware disentangled decoder. Rather than previously entangled decoding, it distributes different semantic granularity to topics at different levels by disentangled decoding. This facilitates the rationality of hierarchies. Experiments on benchmark datasets demonstrate that our method surpasses state-of-the-art baselines, effectively improving the affinity, rationality, and diversity of hierarchical topic modeling with better performance on downstream tasks.

On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling

TL;DR

. CDD decouples decoding by level and injects a contextual topical bias

derived from neighboring levels, promoting different semantic granularity across levels and improving rationality. The overall objective combines the TPD regularization and a VAEs-style topic-modeling loss:

, where the TM term includes per-level reconstructions with bias and a KL term for the latent r. Empirical results on NeurIPS, ACL, NYT, WikiText-103, and 20NG show TraCo consistently outperforms baselines in topic quality, hierarchical affinity/rationality/diversity, and downstream tasks, demonstrating more interpretable and task-effective topic hierarchies.

Abstract

Paper Structure (36 sections, 10 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 36 sections, 10 equations, 6 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Conventional Hierarchical Topic Models
Neural Hierarchical Topic Models
Methodology
Problem Setting and Notations
Parameterizing Hierarchical Latent Topics
Transport Plan Dependency
Why Low Affinity and Diversity?
Modeling Dependencies as Transport Plans
Objective for TPD
Inferring Doc-Topic Distributions of Levels
Context-aware Disentangled Decoder
Why Low Rationality?
Contextual Topical Bias
...and 21 more sections

Figures (6)

Figure 1: Illustration of low affinity (left), and low rationality and diversity issues (right) from Wikitext-103 and NeurIPS. Each rectangle is the top related words of a topic from HyperMiner xu2022hyperminer. Repetitive words are underlined.
Figure 2: t-SNE visualization Maaten2008 of learned child ($\bullet$) and parent ($\blacktriangle$) topic embeddings of two levels. (a,b): Some child topic embeddings are not close enough to their parents; some are excessively gathered together. (c): TraCo pushes each child topic embedding only close to its parent and away from others, and avoids gathering excessive ones together.
Figure 3: Illustration of TPD. It models the dependency $\boldsymbol{\mathbf{\varphi}}^{(\ell)}_{kk'}$ as the transport plan from topic embedding $\boldsymbol{\mathbf{t}}^{(\ell+1)}_{k}$ to $\boldsymbol{\mathbf{t}}^{(\ell)}_{k'}$ in measures $\gamma^{(\ell+1)}$ and $\phi^{(\ell)}$, constrained by the weight of $\boldsymbol{\mathbf{t}}^{(\ell+1)}_{k}$ as $1/K^{(\ell+1)}$ and $\boldsymbol{\mathbf{t}}^{(\ell)}_{k'}$ as $s^{(\ell)}_{k'}$. Here TPD pushes $\boldsymbol{\mathbf{t}}^{(\ell+1)}_1$ close to $\boldsymbol{\mathbf{t}}^{(\ell)}_{1}$ and away from others, similar for $\boldsymbol{\mathbf{t}}^{(\ell+1)}_{2}$.
Figure 4: Comparison of decoders for hierarchical topic modeling. Here $\boldsymbol{\mathbf{\beta}}^{(\ell)}$ and $\boldsymbol{\mathbf{\theta}}^{(\ell)}$ are the topic-word distribution matrix and doc-topic distribution at level $\ell$ respectively. $\boldsymbol{\mathbf{x}}$ is an input document to be decoded. (a): Decoding only with the lowest level. (b): Decoding with all levels. (c): Decoding with each level individually. For example, here the decoding using level $\ell$ incorporates the contextual topical bias $\boldsymbol{\mathbf{b}}^{(\ell)}$. The bias includes topical semantics from contextual levels ($\ell\!-\!1$ and $\ell\!+\!1$), like the top related words "neural layer network" and "resnet convnet highway". This encourages topics at level $\ell$ ($\boldsymbol{\mathbf{\beta}}^{(\ell)}$) to cover semantics different from them, like "deep convolutional cnn" (See this example in case studies). It is similar for other levels.
Figure 5: Case study: discovered topic hierarchies from different datasets. Each rectangle is the top related words of a topic.
...and 1 more figures

On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling

TL;DR

Abstract

On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling

Authors

TL;DR

Abstract

Table of Contents

Figures (6)