Table of Contents
Fetching ...

SparkMe: Adaptive Semi-Structured Interviewing for Qualitative Insight Discovery

David Anugraha, Vishakh Padmakumar, Diyi Yang

TL;DR

SparkMe is introduced, a multi-agent LLM interviewer that performs deliberative planning via simulated conversation rollouts to select questions with high expected utility, and Domain experts rate SparkMe as producing high-quality adaptive interviews that surface helpful profession-specific insights not captured by prior approaches.

Abstract

Qualitative insights from user experiences are critical for informing product and policy decisions, but collecting such data at scale is constrained by the time and availability of experts to conduct semi-structured interviews. Recent work has explored using large language models (LLMs) to automate interviewing, yet existing systems lack a principled mechanism for balancing systematic coverage of predefined topics with adaptive exploration, or the ability to pursue follow-ups, deep dives, and emergent themes that arise organically during conversation. In this work, we formulate adaptive semi-structured interviewing as an optimization problem over the interviewer's behavior. We define interview utility as a trade-off between coverage of a predefined interview topic guide, discovery of relevant emergent themes, and interview cost measured by length. Based on this formulation, we introduce SparkMe, a multi-agent LLM interviewer that performs deliberative planning via simulated conversation rollouts to select questions with high expected utility. We evaluate SparkMe through controlled experiments with LLM-based interviewees, showing that it achieves higher interview utility, improving topic guide coverage (+4.7% over the best baseline) and eliciting richer emergent insights while using fewer conversational turns than prior LLM interviewing approaches. We further validate SparkMe in a user study with 70 participants across 7 professions on the impact of AI on their workflows. Domain experts rate SparkMe as producing high-quality adaptive interviews that surface helpful profession-specific insights not captured by prior approaches. The code, datasets, and evaluation protocols for SparkMe are available as open-source at https://github.com/SALT-NLP/SparkMe.

SparkMe: Adaptive Semi-Structured Interviewing for Qualitative Insight Discovery

TL;DR

SparkMe is introduced, a multi-agent LLM interviewer that performs deliberative planning via simulated conversation rollouts to select questions with high expected utility, and Domain experts rate SparkMe as producing high-quality adaptive interviews that surface helpful profession-specific insights not captured by prior approaches.

Abstract

Qualitative insights from user experiences are critical for informing product and policy decisions, but collecting such data at scale is constrained by the time and availability of experts to conduct semi-structured interviews. Recent work has explored using large language models (LLMs) to automate interviewing, yet existing systems lack a principled mechanism for balancing systematic coverage of predefined topics with adaptive exploration, or the ability to pursue follow-ups, deep dives, and emergent themes that arise organically during conversation. In this work, we formulate adaptive semi-structured interviewing as an optimization problem over the interviewer's behavior. We define interview utility as a trade-off between coverage of a predefined interview topic guide, discovery of relevant emergent themes, and interview cost measured by length. Based on this formulation, we introduce SparkMe, a multi-agent LLM interviewer that performs deliberative planning via simulated conversation rollouts to select questions with high expected utility. We evaluate SparkMe through controlled experiments with LLM-based interviewees, showing that it achieves higher interview utility, improving topic guide coverage (+4.7% over the best baseline) and eliciting richer emergent insights while using fewer conversational turns than prior LLM interviewing approaches. We further validate SparkMe in a user study with 70 participants across 7 professions on the impact of AI on their workflows. Domain experts rate SparkMe as producing high-quality adaptive interviews that surface helpful profession-specific insights not captured by prior approaches. The code, datasets, and evaluation protocols for SparkMe are available as open-source at https://github.com/SALT-NLP/SparkMe.
Paper Structure (76 sections, 9 equations, 18 figures, 9 tables)

This paper contains 76 sections, 9 equations, 18 figures, 9 tables.

Figures (18)

  • Figure 1: Collecting qualitative data at scale with LLM-based systems requires balancing coverage of a predefined topic guide with exploration of emergent conversational themes without unnecessarily burdening the interviewee. We formalize this trade-off as a tractable utility function that guides the design of interviewer agents (\ref{['sec:interview_formulation']}). Motivated by the lack of explicit mechanisms for emergence in prior systems, we design SparkMe to prioritize both coverage and emergence (\ref{['sec:system_design']}) by periodically simulating conversation rollouts and selecting directions with high expected utility gain.
  • Figure 2: ExplorationPlanner (EP) runs asynchronously every $k$ turns, simulating multiple conversation rollouts and scoring them by expected utility gain to propose conversation directions for prioritization.
  • Figure 3: Predefined subtopic coverage and utility of subtopics ($y$-axis) as a function of the number of interview turns ($x$-axis) for different systems (\ref{['sec:baselines']}), using Qwen3-30B-A3B-Instruct-2507 as the backbone for the interviewer LLM and user agent. SparkMe (Ours) efficiently converges to a higher coverage and utility value, consistently outperforming other baselines (\ref{['sec:auto_findings']}).
  • Figure 4: The first two panels show the Flesch-Kincaid Grade Level and Flesch Reading Ease of the questions in the automated evaluation. The last three panels show local coherence, transition quality, and follow-up contingency of the interview questions, rated on a 1–-5 Likert scale, to evaluate the overall quality and flow of the interviews.
  • Figure 5: Number of emergent subtopics identified and covered between with and without EP. The number of covered emergent subtopics for other baselines is not shown since they have 0 covered emergent subtopics according to our evaluation setup.
  • ...and 13 more figures