Table of Contents
Fetching ...

A Multiscale Geometric Method for Capturing Relational Topic Alignment

Conrad D. Hougen, Karl T. Pazdernik, Alfred O. Hero

TL;DR

The paper addresses the challenge of tracking niche, time-evolving topics within co-authorship networks using interpretable models. It introduces MSTML, a multiscale geometric framework that fuses time-sliced LDA ensembles with a topic-space dendrogram guided by Hellinger distances and Ward's linkage, visualized via PHATE embeddings. The approach yields smooth temporal topic alignment and interpretable visualizations, identifying rare-topic structure that transformer-based methods often overlook, albeit with some trade-offs in topic coherence. Overall, MSTML offers a principled, scalable alternative for monitoring scientific novelty and topic drift across time and collaboration networks.

Abstract

Interpretable topic modeling is essential for tracking how research interests evolve within co-author communities. In scientific corpora, where novelty is prized, identifying underrepresented niche topics is particularly important. However, contemporary models built from dense transformer embeddings tend to miss rare topics and therefore also fail to capture smooth temporal alignment. We propose a geometric method that integrates multimodal text and co-author network data, using Hellinger distances and Ward's linkage to construct a hierarchical topic dendrogram. This approach captures both local and global structure, supporting multiscale learning across semantic and temporal dimensions. Our method effectively identifies rare-topic structure and visualizes smooth topic drift over time. Experiments highlight the strength of interpretable bag-of-words models when paired with principled geometric alignment.

A Multiscale Geometric Method for Capturing Relational Topic Alignment

TL;DR

The paper addresses the challenge of tracking niche, time-evolving topics within co-authorship networks using interpretable models. It introduces MSTML, a multiscale geometric framework that fuses time-sliced LDA ensembles with a topic-space dendrogram guided by Hellinger distances and Ward's linkage, visualized via PHATE embeddings. The approach yields smooth temporal topic alignment and interpretable visualizations, identifying rare-topic structure that transformer-based methods often overlook, albeit with some trade-offs in topic coherence. Overall, MSTML offers a principled, scalable alternative for monitoring scientific novelty and topic drift across time and collaboration networks.

Abstract

Interpretable topic modeling is essential for tracking how research interests evolve within co-author communities. In scientific corpora, where novelty is prized, identifying underrepresented niche topics is particularly important. However, contemporary models built from dense transformer embeddings tend to miss rare topics and therefore also fail to capture smooth temporal alignment. We propose a geometric method that integrates multimodal text and co-author network data, using Hellinger distances and Ward's linkage to construct a hierarchical topic dendrogram. This approach captures both local and global structure, supporting multiscale learning across semantic and temporal dimensions. Our method effectively identifies rare-topic structure and visualizes smooth topic drift over time. Experiments highlight the strength of interpretable bag-of-words models when paired with principled geometric alignment.

Paper Structure

This paper contains 12 sections, 6 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The topic space dendrogram (hierarchical tree, top) links chunk topics (multi-colored rectangles), meta-topics (m13-m23), and the co-author network (network, below). Several example links (red, pink) between the dendrogram and co-author network are included. These links represent author distributions over multiple chunk topic leaf nodes.
  • Figure 2: (Top) Hellinger-PHATE embedding of the topic manifold with 9 meta topics. (Left Panels) Dorsa Sadigh is starred; network edges represent RL/Robotics community publications during 3 time windows. (Right Panels) Star sizes are proportional to Sadigh's inferred topic distributions across topic points from each time window; colors represent meta topics.
  • Figure 3: PHATE embeddings for time alignment comparison. BERTopic (left) shows tight, discrete clusters, while LDA ensemble (right) is more smooth, with clear temporal clustering.

Theorems & Definitions (3)

  • Definition 1: Topic Neighborhood Overlap
  • Definition 2: Exponential Temporal Spectral Gap
  • Definition 3: Co-Author Network