Neural Multimodal Topic Modeling: A Comprehensive Evaluation
Felipe González-Pizarro, Giuseppe Carenini
TL;DR
The paper addresses how to perform topic analysis on multimodal documents comprising text and images, an area where traditional topic models struggle. It introduces two neural multimodal topic analyses, Multimodal-ZeroShotTM and Multimodal-Contrast, adapted from CTM and M3L-Contrast respectively, and two image-focused evaluation metrics IEC and IEPS to assess topic-image coherence and diversity. A comprehensive six-dataset benchmark and rigorous automatic and human evaluations demonstrate that while both models yield coherent and diverse topics, their relative strengths depend on the dataset and metric, with IEC/IEPS aligning with human judgments. The work provides a practical framework for evaluating multimodal topic models and highlights opportunities for hybrid approaches and future integration with large-language models, while acknowledging language limitations and other ethical considerations.
Abstract
Neural topic models can successfully find coherent and diverse topics in textual data. However, they are limited in dealing with multimodal datasets (e.g., images and text). This paper presents the first systematic and comprehensive evaluation of multimodal topic modeling of documents containing both text and images. In the process, we propose two novel topic modeling solutions and two novel evaluation metrics. Overall, our evaluation on an unprecedented rich and diverse collection of datasets indicates that both of our models generate coherent and diverse topics. Nevertheless, the extent to which one method outperforms the other depends on the metrics and dataset combinations, which suggests further exploration of hybrid solutions in the future. Notably, our succinct human evaluation aligns with the outcomes determined by our proposed metrics. This alignment not only reinforces the credibility of our metrics but also highlights the potential for their application in guiding future multimodal topic modeling endeavors.
