Studying the Effects of Collaboration in Interactive Theme Discovery Systems
Alvin Po-Chun Chen, Rohan Das, Dananjay Srinivas, Alexandra Barry, Maksim Seniw, Maria Leonor Pacheco
TL;DR
This work addresses the lack of standardized evaluation for NLP-assisted qualitative coding by proposing a framework that assesses consistency, cohesiveness, and correctness across synchronous and asynchronous collaboration. It experimentally compares three diverse interactive tools (topic-model-based, relational, and LLM-based) on a large COVID-19 vaccine tweet dataset. Key findings show that collaboration modality markedly influences output quality for some tools, with synchronous deliberation boosting consistency and cohesion, while LLM-based approaches raise concerns about scalability and reliability. The paper provides actionable recommendations and a generalizable evaluation framework to guide robust, real-world assessments of HitL qualitative coding tools.
Abstract
NLP-assisted solutions have gained considerable traction to support qualitative data analysis. However, there does not exist a unified evaluation framework that can account for the many different settings in which qualitative researchers may employ them. In this paper, we take a first step in this direction by proposing an evaluation framework to study the way in which different tools may result in different outcomes depending on the collaboration strategy employed. Specifically, we study the impact of synchronous vs. asynchronous collaboration using two different NLP-assisted qualitative research tools and present a comprehensive analysis of significant differences in the consistency, cohesiveness, and correctness of their outputs.
