Moving Beyond LDA: A Comparison of Unsupervised Topic Modelling Techniques for Qualitative Data Analysis of Online Communities
Amandeep Kaur, James R. Wallace
TL;DR
This work addresses the barrier that qualitative researchers face in applying topic modelling to large social media corpora. It evaluates three unsupervised techniques—LDA, NMF, and BERTopic—by integrating BERTopic into the Computational Thematic Analysis Toolkit and conducting interviews with qualitative researchers. Results show BERTopic delivers superior topic coherence, diversity, and the ability to reveal hidden relationships, albeit with higher computation and navigation complexity; researchers nevertheless valued its granularity and interpretability. The study demonstrates the potential of LLM-based methods to enhance qualitative analysis workflows and provides design guidance for usable, ethical, and explainable tooling, with future work focusing on hierarchical visualizations and broader, longitudinal evaluations.
Abstract
Social media constitutes a rich and influential source of information for qualitative researchers. Although computational techniques like topic modelling assist with managing the volume and diversity of social media content, qualitative researcher's lack of programming expertise creates a significant barrier to their adoption. In this paper we explore how BERTopic, an advanced Large Language Model (LLM)-based topic modelling technique, can support qualitative data analysis of social media. We conducted interviews and hands-on evaluations in which qualitative researchers compared topics from three modelling techniques: LDA, NMF, and BERTopic. BERTopic was favoured by 8 of 12 participants for its ability to provide detailed, coherent clusters for deeper understanding and actionable insights. Participants also prioritised topic relevance, logical organisation, and the capacity to reveal unexpected relationships within the data. Our findings underscore the potential of LLM-based techniques for supporting qualitative analysis.
