Telephone Surveys Meet Conversational AI: Evaluating a LLM-Based Telephone Survey System at Scale

Max M. Lang; Sol Eskenazi

Telephone Surveys Meet Conversational AI: Evaluating a LLM-Based Telephone Survey System at Scale

Max M. Lang, Sol Eskenazi

TL;DR

The paper addresses the resource intensity of traditional telephone surveys by presenting a scalable AI-based telephone interviewer that uses a STT-LLM-TTS pipeline to conduct full-duplex interviews. It evaluates a US pilot ($n=75$) and a Peru deployment ($n=2{,}739$), applying AAPOR-style completion metrics ($RR_1$, $RR_2$) and comparing data quality for structured items to human enumerators, while also examining qualitative depth. Key contributions include a working end-to-end AI interviewing system, large-scale real-world deployment, and a detailed analysis of deployment logistics, safety safeguards, and data quality trade-offs, with the finding that structured-item data quality approaches human performance though qualitative probing remains limited. The work demonstrates the practical viability of AI-powered phone surveys at scale, offering substantial potential for scalable, consistent data collection in market research, social science, and public opinion, while highlighting areas for future enhancement in probing depth, multilingual support, and end-to-end automation.

Abstract

Telephone surveys remain a valuable tool for gathering insights but typically require substantial resources in training and coordinating human interviewers. This work presents an AI-driven telephone survey system integrating text-to-speech (TTS), a large language model (LLM), and speech-to-text (STT) that mimics the versatility of human-led interviews (full-duplex dialogues) at scale. We tested the system across two populations, a pilot study in the United States (n = 75) and a large-scale deployment in Peru (n = 2,739), inviting participants via web-based links and contacting them via direct phone calls. The AI agent successfully administered open-ended and closed-ended questions, handled basic clarifications, and dynamically navigated branching logic, allowing fast large-scale survey deployment without interviewer recruitment or training. Our findings demonstrate that while the AI system's probing for qualitative depth was more limited than human interviewers, overall data quality approached human-led standards for structured items. This study represents one of the first successful large-scale deployments of an LLM-based telephone interviewer in a real-world survey context. The AI-powered telephone survey system has the potential for expanding scalable, consistent data collecting across market research, social science, and public opinion studies, thus improving operational efficiency while maintaining appropriate data quality for research.

Telephone Surveys Meet Conversational AI: Evaluating a LLM-Based Telephone Survey System at Scale

TL;DR

Abstract

Telephone Surveys Meet Conversational AI: Evaluating a LLM-Based Telephone Survey System at Scale

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)