Table of Contents
Fetching ...

MimiTalk: Revolutionizing Qualitative Research with Dual-Agent AI

Fengming Liu, Shubin Yu

TL;DR

MimiTalk presents a dual-agent constitutional AI framework for scalable qualitative research, combining a supervisor that ensures ethical oversight with a conversational agent that generates interview questions. Through Study 1 (usability), Study 2 (large-scale AI vs. human interviews on MediaSum data), and Study 3 (cross-disciplinary human-AI interviews), the work demonstrates that AI-led interviews can achieve higher information richness, lexical diversity, and semantic coherence than human-led equivalents, while human interviews retain advantages in cultural and emotional nuance. Propensity Score Matching provides causal evidence that AI interviewing improves several linguistic quality metrics, and a detailed qualitative analysis highlights complementary strengths and boundary conditions for AI in handling sensitive or culturally nuanced topics. The MimiTalk framework thus offers a scalable, quality-controlled paradigm for human–AI collaboration in qualitative research, with implications for broad domain applications and future longitudinal validation.

Abstract

We present MimiTalk, a dual-agent constitutional AI framework designed for scalable and ethical conversational data collection in social science research. The framework integrates a supervisor model for strategic oversight and a conversational model for question generation. We conducted three studies: Study 1 evaluated usability with 20 participants; Study 2 compared 121 AI interviews to 1,271 human interviews from the MediaSum dataset using NLP metrics and propensity score matching; Study 3 involved 10 interdisciplinary researchers conducting both human and AI interviews, followed by blind thematic analysis. Results across studies indicate that MimiTalk reduces interview anxiety, maintains conversational coherence, and outperforms human interviews in information richness, coherence, and stability. AI interviews elicit technical insights and candid views on sensitive topics, while human interviews better capture cultural and emotional nuances. These findings suggest that dual-agent constitutional AI supports effective human-AI collaboration, enabling replicable, scalable and quality-controlled qualitative research.

MimiTalk: Revolutionizing Qualitative Research with Dual-Agent AI

TL;DR

MimiTalk presents a dual-agent constitutional AI framework for scalable qualitative research, combining a supervisor that ensures ethical oversight with a conversational agent that generates interview questions. Through Study 1 (usability), Study 2 (large-scale AI vs. human interviews on MediaSum data), and Study 3 (cross-disciplinary human-AI interviews), the work demonstrates that AI-led interviews can achieve higher information richness, lexical diversity, and semantic coherence than human-led equivalents, while human interviews retain advantages in cultural and emotional nuance. Propensity Score Matching provides causal evidence that AI interviewing improves several linguistic quality metrics, and a detailed qualitative analysis highlights complementary strengths and boundary conditions for AI in handling sensitive or culturally nuanced topics. The MimiTalk framework thus offers a scalable, quality-controlled paradigm for human–AI collaboration in qualitative research, with implications for broad domain applications and future longitudinal validation.

Abstract

We present MimiTalk, a dual-agent constitutional AI framework designed for scalable and ethical conversational data collection in social science research. The framework integrates a supervisor model for strategic oversight and a conversational model for question generation. We conducted three studies: Study 1 evaluated usability with 20 participants; Study 2 compared 121 AI interviews to 1,271 human interviews from the MediaSum dataset using NLP metrics and propensity score matching; Study 3 involved 10 interdisciplinary researchers conducting both human and AI interviews, followed by blind thematic analysis. Results across studies indicate that MimiTalk reduces interview anxiety, maintains conversational coherence, and outperforms human interviews in information richness, coherence, and stability. AI interviews elicit technical insights and candid views on sensitive topics, while human interviews better capture cultural and emotional nuances. These findings suggest that dual-agent constitutional AI supports effective human-AI collaboration, enabling replicable, scalable and quality-controlled qualitative research.

Paper Structure

This paper contains 35 sections, 10 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Side-by-side display of the MimiTalk.app interview interface and system architecture: (a) The interview interface features a minimalist design to avoid confounding variables and focus on interview content; (b) The dual-agent collaborative architecture integrates constitutional principles and real-time context analysis.
  • Figure 2: Information entropy distributions comparing AI and human interviews. (a) Overall transcript entropy: AI interviews demonstrate significantly higher linguistic diversity (7.703 ± 0.399) compared to human interviews (7.273 ± 0.395), representing a 5.9% increase in vocabulary richness. (b) Interviewee response entropy: AI interviewees exhibit higher entropy (7.098 ± 0.791) than human interviewees (6.907 ± 0.446), indicating more diverse vocabulary usage in responses. (c) Interviewer text entropy: AI interviewers show substantially higher entropy (7.325 ± 0.512) compared to human interviewers (6.762 ± 0.401), representing an 8.3% increase in question diversity.
  • Figure 3: Comprehensive analysis summaries. (a) Comprehensive information entropy comparison across all categories: The violin plots demonstrate that AI interviews consistently achieve higher linguistic diversity across overall transcripts, interviewer questions, and interviewee responses, with tighter distributions indicating more reliable performance. (b) Kernel density estimation of token count distributions: The density plots reveal distinct patterns where AI interviews show broader, more right-skewed distributions for both interviewers and interviewees, indicating greater variability in response lengths compared to human interviews. (c) Comprehensive semantic similarity comparison across all categories: The violin plots demonstrate that AI interviews consistently outperform human interviews in semantic coherence across interviewer internal, interviewee internal, and cross-speaker similarity measures.
  • Figure 4: Token count distributions in AI and human interviews. (a) Interviewee response token count: AI interviewees produce significantly longer responses (24.710 ± 22.232 tokens per sentence) compared to human interviewees (18.797 ± 15.453 tokens), representing a 31.4% increase in response length. (b) Interviewer question token count: AI interviewers generate slightly longer questions (16.500 ± 11.614 tokens) than human interviewers (14.534 ± 11.269 tokens), showing a 13.8% increase. (c) Overall token count distribution: AI interviews demonstrate higher average token counts (20.177 ± 17.678) compared to human interviews (17.003 ± 14.006), with greater variability indicating more diverse response strategies.
  • Figure 5: Semantic similarity comparison between AI and human interviews. (a) Interviewer internal similarity: AI interviewers exhibit higher semantic consistency (0.6540 ± 0.0493) than human interviewers (0.5964 ± 0.0265), indicating more coherent questioning strategies by AI. (b) Interviewee internal similarity: AI interviewees show greater internal semantic consistency (0.6860 ± 0.0454) compared to human interviewees (0.6446 ± 0.0278), suggesting more consistent responses within the same interview. (c) Cross-speaker similarity: The semantic alignment between interviewer and interviewee is higher in AI interviews (0.6234 ± 0.0521 vs. 0.6057 ± 0.0218), reflecting an advantage of AI interviews in conversational coherence and topic consistency.
  • ...and 1 more figures