Efficiency with Rigor! A Trustworthy LLM-powered Workflow for Qualitative Data Analysis

Jie Gao; Zhiyao Shu; Shun Yi Yeo; Alok Prakash; Chien-Ming Huang; Mark Dredze; Ziang Xiao

Efficiency with Rigor! A Trustworthy LLM-powered Workflow for Qualitative Data Analysis

Jie Gao, Zhiyao Shu, Shun Yi Yeo, Alok Prakash, Chien-Ming Huang, Mark Dredze, Ziang Xiao

TL;DR

MindCoder addresses the challenge of balancing efficiency and trustworthiness in qualitative data analysis by delegating mechanical tasks to a transparent LLM workflow while preserving interpretive work for humans. It derives six design requirements from literature and formative interviews (DR1–DR6) and implements them in a web-based system with an auditable codebook trajectory. In a within-subject study against Atlas.ti AI Coding, MindCoder improved active interpretation, flexible control, and perceived trustworthiness of results, with user experiences varying by expertise. The work contributes a concrete, user-centered blueprint for human-AI collaboration in QDA and offers design implications for future LLM-powered qualitative analysis tools.

Abstract

Qualitative data analysis (QDA) emphasizes trustworthiness, requiring sustained human engagement and reflexivity. Recently, large language models (LLMs) have been applied in QDA to improve efficiency. However, their use raises concerns about unvalidated automation and displaced sensemaking, which can undermine trustworthiness. To address these issues, we employed two strategies: transparency and human involvement. Through a literature review and formative interviews, we identified six design requirements for transparent automation and meaningful human involvement. Guided by these requirements, we developed MindCoder, an LLM-powered workflow that delegates mechanical tasks, such as grouping and validation, to the system, while enabling humans to conduct meaningful interpretation. MindCoder also maintains comprehensive logs of users' step-by-step interactions to ensure transparency and support trustworthy results. In an evaluation with 12 users and two external evaluators, MindCoder supported active interpretation, offered flexible control, and produced more trustworthy codebooks. We further discuss design implications for building human-AI collaborative QDA workflows.

Efficiency with Rigor! A Trustworthy LLM-powered Workflow for Qualitative Data Analysis

TL;DR

Abstract

Paper Structure (93 sections, 1 equation, 10 figures, 3 tables)

This paper contains 93 sections, 1 equation, 10 figures, 3 tables.

Introduction
Related Work
Trustworthiness and Strategies in QDA
LLM-powered QDA Workflow
LLM-led analysis
Human-led analysis
Human Involvement in QDA
Design Requirements
Narrative Literature Review
Method
Results
DR1: Automating the whole pipeline transparently through prompt chaining
Initial DR for human involvement
Formative Interview
Participants
...and 78 more sections

Figures (10)

Figure 1: An overview of our motivation and method.
Figure 2: MindCoder: A Trustworthy LLM-Powered Workflow for Qualitative Analysis. Users can perform QDA through the following steps: Step 1. Use MindCoder to conduct primary analysis. Step 2. Build on the primary analysis to perform sensemaking and interpretation, and save the results in a trustworthy codebook. Step 3. Leverage the trustworthy codebook to support downstream tasks, such as group discussions or secondary interpretations.
Figure 3: Interface for "Stage 1: Generating Open Codes" in "Step 1" of Figure \ref{['fig:workflow']}. (1) MindCoder’s Mechanical Task: the LLM (1a) reports “What LLM did” during open coding and (1b) provides self-critique. (2) Human Interpretation: the user (2a) specifies the number of open codes to generate, (2b) writes prompts to regrouping clusters and assign open code names—automatically updating the rationales for new grouping in (1a) and (1b)—and (2c) adds their interpretive memos. (3) Displayed Output: MindCoder presents the LLM-generated open codes. (3a) Users can regenerate codes based on prompts from (2b), and (3b) the LLM can update subthemes and themes to reflect new groupings.
Figure 4: Interface for "Stage 2: Generating Sub-themes" in "Step 1" of Figure \ref{['fig:workflow']}. (1) MindCoder’s Mechanical Task: the LLM (1a) reports “what it did” during open coding and (1b) provides a self-critique. (2) Human Interpretation: the user (2a) writes prompts to regroup clusters and assign subtheme names—automatically updating the rationales in (1a) and (1b)—and (2b) adds interpretive memos. (3) Displayed Subthemes: MindCoder presents the LLM-generated subthemes. (3a) Users can review open codes under each subtheme, (3b) see any ungrouped codes (none in this example), (3c) regenerate results based on prompts from (2a), and (3d) update the rest of the coding to align with the new subthemes.
Figure 5: The Structure of MindCoder’s Trustworthy Codebook with a Transparent Trajectory. 1) Key Finding Summary shows high-level synthesized takeaways for readers; 2) Theme Map & Primary Codebook shows structured codes, sub-themes, and themes with exemplar quotes 3) Codebook Trajectory shows stepwise progression from open codes to themes, showing user's interpretation and reflection in each step; 4) Disclaimer clarifies the limitations of AI-generated content and reinforces interpretive role of human analysts. We show a sample trustworthy codebook in supplementary materials.
...and 5 more figures

Efficiency with Rigor! A Trustworthy LLM-powered Workflow for Qualitative Data Analysis

TL;DR

Abstract

Efficiency with Rigor! A Trustworthy LLM-powered Workflow for Qualitative Data Analysis

Authors

TL;DR

Abstract

Table of Contents

Figures (10)