Table of Contents
Fetching ...

Hierarchical Graph Topic Modeling with Topic Tree-based Transformer

Delvin Ce Zhang, Menglin Yang, Xiaobao Wu, Jiasheng Zhang, Hady W. Lauw

TL;DR

GTFormer addresses the challenge of learning document representations that respect both topic hierarchy within documents and graph hierarchy across linked documents. It introduces a topic tree embedded in hyperbolic space via a Hyperbolic Doubly Recurrent Network and embeds graph structure with a Hyperbolic Graph Neural Network, integrating both into each Transformer layer. The approach yields improved performance on document classification, link prediction, and topic coherence across multiple datasets, outperforming both flat topic models and prior hierarchical or text-attributed graph methods. This unified, geometry-aware framework advances hierarchical representation learning for large, structured text corpora and suggests opportunities for multilingual and real-time applications.

Abstract

Textual documents are commonly connected in a hierarchical graph structure where a central document links to others with an exponentially growing connectivity. Though Hyperbolic Graph Neural Networks (HGNNs) excel at capturing such graph hierarchy, they cannot model the rich textual semantics within documents. Moreover, text contents in documents usually discuss topics of different specificity. Hierarchical Topic Models (HTMs) discover such latent topic hierarchy within text corpora. However, most of them focus on the textual content within documents, and ignore the graph adjacency across interlinked documents. We thus propose a Hierarchical Graph Topic Modeling Transformer to integrate both topic hierarchy within documents and graph hierarchy across documents into a unified Transformer. Specifically, to incorporate topic hierarchy within documents, we design a topic tree and infer a hierarchical tree embedding for hierarchical topic modeling. To preserve both topic and graph hierarchies, we design our model in hyperbolic space and propose Hyperbolic Doubly Recurrent Neural Network, which models ancestral and fraternal tree structure. Both hierarchies are inserted into each Transformer layer to learn unified representations. Both supervised and unsupervised experiments verify the effectiveness of our model.

Hierarchical Graph Topic Modeling with Topic Tree-based Transformer

TL;DR

GTFormer addresses the challenge of learning document representations that respect both topic hierarchy within documents and graph hierarchy across linked documents. It introduces a topic tree embedded in hyperbolic space via a Hyperbolic Doubly Recurrent Network and embeds graph structure with a Hyperbolic Graph Neural Network, integrating both into each Transformer layer. The approach yields improved performance on document classification, link prediction, and topic coherence across multiple datasets, outperforming both flat topic models and prior hierarchical or text-attributed graph methods. This unified, geometry-aware framework advances hierarchical representation learning for large, structured text corpora and suggests opportunities for multilingual and real-time applications.

Abstract

Textual documents are commonly connected in a hierarchical graph structure where a central document links to others with an exponentially growing connectivity. Though Hyperbolic Graph Neural Networks (HGNNs) excel at capturing such graph hierarchy, they cannot model the rich textual semantics within documents. Moreover, text contents in documents usually discuss topics of different specificity. Hierarchical Topic Models (HTMs) discover such latent topic hierarchy within text corpora. However, most of them focus on the textual content within documents, and ignore the graph adjacency across interlinked documents. We thus propose a Hierarchical Graph Topic Modeling Transformer to integrate both topic hierarchy within documents and graph hierarchy across documents into a unified Transformer. Specifically, to incorporate topic hierarchy within documents, we design a topic tree and infer a hierarchical tree embedding for hierarchical topic modeling. To preserve both topic and graph hierarchies, we design our model in hyperbolic space and propose Hyperbolic Doubly Recurrent Neural Network, which models ancestral and fraternal tree structure. Both hierarchies are inserted into each Transformer layer to learn unified representations. Both supervised and unsupervised experiments verify the effectiveness of our model.

Paper Structure

This paper contains 16 sections, 19 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: (a) Graph hierarchy, (b) topic hierarchy.
  • Figure 2: Illustration of (a) our proposed GTFormer, (b) topic tree embedding, and (c) Hyperbolic Doubly Recurrent Neural Network. Hyperbolic operations are omitted for clarity. Best seen in color.
  • Figure 3: Ablation analysis of our model. Best seen in color.
  • Figure 4: Topic tree structure learned on PL dataset.
  • Figure 5: Visualization on ML dataset.