Table of Contents
Fetching ...

A Reproducible Framework for Neural Topic Modeling in Focus Group Analysis

Heger Arfaoui, Mohammed Iheb Hergli, Beya Benzina, Slimane BenMiled

TL;DR

The paper addresses the challenge of reproducibly extracting themes from small, heterogeneous focus group transcripts by applying a systematic BERTopic framework with comprehensive hyperparameter exploration and bootstrap stability analysis. It demonstrates that a 7-topic BERTopic model achieves higher coherence (0.573) than a tuned LDA baseline (0.486) and gains validation from domain experts (ICC 0.700, kappa 0.678). By transparently documenting modeling decisions, evaluating across multiple metrics, and providing complete code, the work offers a reproducible template for qualitative researchers to scale thematic synthesis while preserving interpretability. The findings underscore the value of contextual embeddings for conversation-rich data and outline practical guidance for applying neural topic modeling to focus group analyses in health research and beyond.

Abstract

Focus group discussions generate rich qualitative data but their analysis traditionally relies on labor-intensive manual coding that limits scalability and reproducibility. We present a systematic framework for applying BERTopic to focus group transcripts using data from ten focus groups exploring HPV vaccine perceptions in Tunisia (1,075 utterances). We conducted comprehensive hyperparameter exploration across 27 configurations, evaluating each through bootstrap stability analysis, performance metrics, and comparison with LDA baseline. Bootstrap analysis revealed that stability metrics (NMI and ARI) exhibited strong disagreement (r = -0.691) and showed divergent relationships with coherence, demonstrating that stability is multifaceted rather than monolithic. Our multi-criteria selection framework yielded a 7-topic model achieving 18\% higher coherence than optimized LDA (0.573 vs. 0.486) with interpretable topics validated through independent human evaluation (ICC = 0.700, weighted Cohen's kappa = 0.678). These findings demonstrate that transformer-based topic modeling can extract interpretable themes from small focus group transcript corpora when systematically configured and validated, while revealing that quality metrics capture distinct, sometimes conflicting constructs requiring multi-criteria evaluation. We provide complete documentation and code to support reproducibility.

A Reproducible Framework for Neural Topic Modeling in Focus Group Analysis

TL;DR

The paper addresses the challenge of reproducibly extracting themes from small, heterogeneous focus group transcripts by applying a systematic BERTopic framework with comprehensive hyperparameter exploration and bootstrap stability analysis. It demonstrates that a 7-topic BERTopic model achieves higher coherence (0.573) than a tuned LDA baseline (0.486) and gains validation from domain experts (ICC 0.700, kappa 0.678). By transparently documenting modeling decisions, evaluating across multiple metrics, and providing complete code, the work offers a reproducible template for qualitative researchers to scale thematic synthesis while preserving interpretability. The findings underscore the value of contextual embeddings for conversation-rich data and outline practical guidance for applying neural topic modeling to focus group analyses in health research and beyond.

Abstract

Focus group discussions generate rich qualitative data but their analysis traditionally relies on labor-intensive manual coding that limits scalability and reproducibility. We present a systematic framework for applying BERTopic to focus group transcripts using data from ten focus groups exploring HPV vaccine perceptions in Tunisia (1,075 utterances). We conducted comprehensive hyperparameter exploration across 27 configurations, evaluating each through bootstrap stability analysis, performance metrics, and comparison with LDA baseline. Bootstrap analysis revealed that stability metrics (NMI and ARI) exhibited strong disagreement (r = -0.691) and showed divergent relationships with coherence, demonstrating that stability is multifaceted rather than monolithic. Our multi-criteria selection framework yielded a 7-topic model achieving 18\% higher coherence than optimized LDA (0.573 vs. 0.486) with interpretable topics validated through independent human evaluation (ICC = 0.700, weighted Cohen's kappa = 0.678). These findings demonstrate that transformer-based topic modeling can extract interpretable themes from small focus group transcript corpora when systematically configured and validated, while revealing that quality metrics capture distinct, sometimes conflicting constructs requiring multi-criteria evaluation. We provide complete documentation and code to support reproducibility.

Paper Structure

This paper contains 26 sections, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Systematic pipeline for topic modeling of focus group transcripts. The workflow begins with transcription and translation, followed by embedding generation. A grid search explores the chosen hyperparameter configurations, evaluating each through bootstrap stability analysis (NMI, ARI) and performance metrics (coherence, topic count, outlier fraction). Multicriteria selection balances performance metrics, with final validation by domain experts giving the interpretable topic structure.
  • Figure 2: Quality metrics across the complete hyperparameter space. Six metrics are displayed as heatmaps across three minimum cluster sizes (columns). Each heatmap shows UMAP n_neighbors (rows: 10, 15, 30) by UMAP n_components (columns: 5, 10, 50). Top panels show bootstrap stability (NMI and ARI), middle panels show topic quality (coherence and number of topics), and bottom panels show coverage metrics (outlier fraction and main topic proportion).
  • Figure 3: Two-dimensional visualization of document embeddings colored by final merged topic assignments. Each point represents a document (utterance) from the focus group transcripts. Colored regions show the 7 merged topics, with spatial proximity indicating semantic similarity.