A Reproducible Framework for Neural Topic Modeling in Focus Group Analysis
Heger Arfaoui, Mohammed Iheb Hergli, Beya Benzina, Slimane BenMiled
TL;DR
The paper addresses the challenge of reproducibly extracting themes from small, heterogeneous focus group transcripts by applying a systematic BERTopic framework with comprehensive hyperparameter exploration and bootstrap stability analysis. It demonstrates that a 7-topic BERTopic model achieves higher coherence (0.573) than a tuned LDA baseline (0.486) and gains validation from domain experts (ICC 0.700, kappa 0.678). By transparently documenting modeling decisions, evaluating across multiple metrics, and providing complete code, the work offers a reproducible template for qualitative researchers to scale thematic synthesis while preserving interpretability. The findings underscore the value of contextual embeddings for conversation-rich data and outline practical guidance for applying neural topic modeling to focus group analyses in health research and beyond.
Abstract
Focus group discussions generate rich qualitative data but their analysis traditionally relies on labor-intensive manual coding that limits scalability and reproducibility. We present a systematic framework for applying BERTopic to focus group transcripts using data from ten focus groups exploring HPV vaccine perceptions in Tunisia (1,075 utterances). We conducted comprehensive hyperparameter exploration across 27 configurations, evaluating each through bootstrap stability analysis, performance metrics, and comparison with LDA baseline. Bootstrap analysis revealed that stability metrics (NMI and ARI) exhibited strong disagreement (r = -0.691) and showed divergent relationships with coherence, demonstrating that stability is multifaceted rather than monolithic. Our multi-criteria selection framework yielded a 7-topic model achieving 18\% higher coherence than optimized LDA (0.573 vs. 0.486) with interpretable topics validated through independent human evaluation (ICC = 0.700, weighted Cohen's kappa = 0.678). These findings demonstrate that transformer-based topic modeling can extract interpretable themes from small focus group transcript corpora when systematically configured and validated, while revealing that quality metrics capture distinct, sometimes conflicting constructs requiring multi-criteria evaluation. We provide complete documentation and code to support reproducibility.
