Table of Contents
Fetching ...

Visualizing Temporal Topic Embeddings with a Compass

Daniel Palamarchuk, Lemara Williams, Brian Mayer, Thomas Danielson, Rebecca Faust, Larry Deschaine, Chris North

TL;DR

This paper proposes an expansion of the compass-aligned temporal Word2Vec methodology into dynamic topic modeling, which allows for the direct comparison of word and document embeddings across time in dynamic topics.

Abstract

Dynamic topic modeling is useful at discovering the development and change in latent topics over time. However, present methodology relies on algorithms that separate document and word representations. This prevents the creation of a meaningful embedding space where changes in word usage and documents can be directly analyzed in a temporal context. This paper proposes an expansion of the compass-aligned temporal Word2Vec methodology into dynamic topic modeling. Such a method allows for the direct comparison of word and document embeddings across time in dynamic topics. This enables the creation of visualizations that incorporate temporal word embeddings within the context of documents into topic visualizations. In experiments against the current state-of-the-art, our proposed method demonstrates overall competitive performance in topic relevancy and diversity across temporal datasets of varying size. Simultaneously, it provides insightful visualizations focused on temporal word embeddings while maintaining the insights provided by global topic evolution, advancing our understanding of how topics evolve over time.

Visualizing Temporal Topic Embeddings with a Compass

TL;DR

This paper proposes an expansion of the compass-aligned temporal Word2Vec methodology into dynamic topic modeling, which allows for the direct comparison of word and document embeddings across time in dynamic topics.

Abstract

Dynamic topic modeling is useful at discovering the development and change in latent topics over time. However, present methodology relies on algorithms that separate document and word representations. This prevents the creation of a meaningful embedding space where changes in word usage and documents can be directly analyzed in a temporal context. This paper proposes an expansion of the compass-aligned temporal Word2Vec methodology into dynamic topic modeling. Such a method allows for the direct comparison of word and document embeddings across time in dynamic topics. This enables the creation of visualizations that incorporate temporal word embeddings within the context of documents into topic visualizations. In experiments against the current state-of-the-art, our proposed method demonstrates overall competitive performance in topic relevancy and diversity across temporal datasets of varying size. Simultaneously, it provides insightful visualizations focused on temporal word embeddings while maintaining the insights provided by global topic evolution, advancing our understanding of how topics evolve over time.
Paper Structure (32 sections, 8 figures, 1 table)

This paper contains 32 sections, 8 figures, 1 table.

Figures (8)

  • Figure 1: Heatmap represents total change in cosine similarities for each term for temporal word embeddings made using TWEC.
  • Figure 2: Scatterplot of UMAP representation of term "purex" between April and May in relation to other key terms.
  • Figure 3: Architecture diagram
  • Figure 4: TTEC Architecture. Step (1) illustrates how the TTEC compass is trained through creating a TDEC Doc2vec compass (1A, 1B), and then creating a topic space based on the documents using UMAP and HDBSCAN (1C) and generating a topic description using global word vectors (1D). Step (2) looks at how individual time slices are trained. A TDEC Doc2vec time slice is trained (2C) using the local time slice corpus (2A) and compass hidden weights (2B). The local documents are then placed into the global topic space to create local topics (2D) and topic descriptions (2E). The local word vectors and topics will be used in TimeLink (Figures \ref{['fig:vis_pipeline']} and \ref{['fig:sankeydash']}).
  • Figure 5: Baseline Purex comparison with documents and DTM. In April, Purex is near topics related to health (aqua), nutrition (red), and home appliances (green). In May, Purex is near geology/radioactive material (purple), archeology (blue), and radioactive material/misc. (orange)
  • ...and 3 more figures