Table of Contents
Fetching ...

An Image is Worth $K$ Topics: A Visual Structural Topic Model with Pretrained Image Embeddings

Matías Piqueras, Alexandra Segerberg, Matteo Magnani, Måns Magnusson, Nataša Sladoje

TL;DR

The paper addresses the challenge of analyzing visual political content at scale by coupling pretrained image embeddings with a structural topic model (vSTM) to allow images to exhibit mixed topic memberships and to relate topic prevalence to covariates via a logistic-normal prior. It formalizes a generative model with image embeddings $\bm{z}_i$ drawn from a mixture of topic embeddings $\bm{\beta}_k$ and topic proportions $\bm{\theta}_i$, where $\bm{\theta}_i$ depends on covariates through $\bm{\Gamma}$ and a covariance structure $\bm{\Omega}_{\theta}$ with an LKJ prior. Inference is conducted with mean-field variational methods using reparameterization and minibatching, enabling scalable analysis, while quantities of interest include posterior means of topics and their covariate-driven prevalence; the authors apply the model to COP-related Twitter images encoded with CLIP, perform model selection (choosing $K=45$), and validate coherence through human-involved intrusion tasks. The empirical results reveal distinct visual worlds by actor and stance, with interpretable topics and meaningful visual co-occurrence patterns, and demonstrate the framework’s potential for multimodal and cross-platform research. The work highlights the importance of pretrained embeddings for political image analysis, discusses limitations such as embedding biases, and points to future directions including improved interpretability and open-source tooling.

Abstract

Political scientists are increasingly interested in analyzing visual content at scale. However, the existing computational toolbox is still in need of methods and models attuned to the specific challenges and goals of social and political inquiry. In this article, we introduce a visual Structural Topic Model (vSTM) that combines pretrained image embeddings with a structural topic model. This has important advantages compared to existing approaches. First, pretrained embeddings allow the model to capture the semantic complexity of images relevant to political contexts. Second, the structural topic model provides the ability to analyze how topics and covariates are related, while maintaining a nuanced representation of images as a mixture of multiple topics. In our empirical application, we show that the vSTM is able to identify topics that are interpretable, coherent, and substantively relevant to the study of online political communication.

An Image is Worth $K$ Topics: A Visual Structural Topic Model with Pretrained Image Embeddings

TL;DR

The paper addresses the challenge of analyzing visual political content at scale by coupling pretrained image embeddings with a structural topic model (vSTM) to allow images to exhibit mixed topic memberships and to relate topic prevalence to covariates via a logistic-normal prior. It formalizes a generative model with image embeddings drawn from a mixture of topic embeddings and topic proportions , where depends on covariates through and a covariance structure with an LKJ prior. Inference is conducted with mean-field variational methods using reparameterization and minibatching, enabling scalable analysis, while quantities of interest include posterior means of topics and their covariate-driven prevalence; the authors apply the model to COP-related Twitter images encoded with CLIP, perform model selection (choosing ), and validate coherence through human-involved intrusion tasks. The empirical results reveal distinct visual worlds by actor and stance, with interpretable topics and meaningful visual co-occurrence patterns, and demonstrate the framework’s potential for multimodal and cross-platform research. The work highlights the importance of pretrained embeddings for political image analysis, discusses limitations such as embedding biases, and points to future directions including improved interpretability and open-source tooling.

Abstract

Political scientists are increasingly interested in analyzing visual content at scale. However, the existing computational toolbox is still in need of methods and models attuned to the specific challenges and goals of social and political inquiry. In this article, we introduce a visual Structural Topic Model (vSTM) that combines pretrained image embeddings with a structural topic model. This has important advantages compared to existing approaches. First, pretrained embeddings allow the model to capture the semantic complexity of images relevant to political contexts. Second, the structural topic model provides the ability to analyze how topics and covariates are related, while maintaining a nuanced representation of images as a mixture of multiple topics. In our empirical application, we show that the vSTM is able to identify topics that are interpretable, coherent, and substantively relevant to the study of online political communication.

Paper Structure

This paper contains 15 sections, 11 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Plate diagram of the generative model of vSTM. $\bm{\beta}_k$ is a topic embedding, $\bm{\theta}_i$ is the topic proportions for image $i$, $\bm{\Gamma}$ is the topic prevalence coefficients, $\bm{z}_i$ is the image embedding and $\bm{x}_i$ are image-level covariates.
  • Figure 2: Diagnostics for model selection. Left panel shows coherence and exclusivity for each model, averaged over all topics. Right panel shows perplexity, averaged over five held-out sets.
  • Figure 3: Left panel: Blue points are images and yellow points topics. Right panel: 25 randomly sampled images in the vicinity of the "Causes/consequences" and "Event flyers" topics.
  • Figure 4: Six images and their top five topics.
  • Figure 5: Graph displaying a subset of positively correlated topics. Two nodes are connected if $\bm{\Omega}_{ij} > 0.1$. The edge size is proportional to the strength of the correlation and clusters are colored in terms of the partition that maximizes modularity.
  • ...and 4 more figures