Table of Contents
Fetching ...

Keeping Users Engaged During Repeated Administration of the Same Questionnaire: Using Large Language Models to Reliably Diversify Questions

Hye Sun Yun, Mehdi Arjmand, Phillip Sherlock, Michael K. Paasche-Orlow, James W. Griffith, Timothy Bickmore

TL;DR

The paper addresses respondent fatigue in repeated VA-administered self-report questionnaires by employing large language models to generate diverse, psychometrically valid item variants. Through a 15-day randomized longitudinal study, the authors demonstrate that LLM-generated variants maintain reliability and convergent validity with an external criterion (PHQ-8) while reducing perceived repetitiveness versus a standard questionnaire. They also explore whether adding LLM-generated small talk and humor enhances engagement, finding no significant gains over variant-only deliveries. The results suggest that LLM-driven diversification can scale and invigorate longitudinal PRO data collection without compromising data quality, though careful prompt design and safety filtering are essential and the benefits of conversational content are not guaranteed.

Abstract

Standardized, validated questionnaires are vital tools in research and healthcare, offering dependable self-report data. Prior work has revealed that virtual agent-administered questionnaires are almost equivalent to self-administered ones in an electronic form. Despite being an engaging method, repeated use of virtual agent-administered questionnaires in longitudinal or pre-post studies can induce respondent fatigue, impacting data quality via response biases and decreased response rates. We propose using large language models (LLMs) to generate diverse questionnaire versions while retaining good psychometric properties. In a longitudinal study, participants interacted with our agent system and responded daily for two weeks to one of the following questionnaires: a standardized depression questionnaire, question variants generated by LLMs, or question variants accompanied by LLM-generated small talk. The responses were compared to a validated depression questionnaire. Psychometric testing revealed consistent covariation between the external criterion and focal measure administered across the three conditions, demonstrating the reliability and validity of the LLM-generated variants. Participants found that the variants were significantly less repetitive than repeated administrations of the same standardized questionnaire. Our findings highlight the potential of LLM-generated variants to invigorate agent-administered questionnaires and foster engagement and interest, without compromising their validity.

Keeping Users Engaged During Repeated Administration of the Same Questionnaire: Using Large Language Models to Reliably Diversify Questions

TL;DR

The paper addresses respondent fatigue in repeated VA-administered self-report questionnaires by employing large language models to generate diverse, psychometrically valid item variants. Through a 15-day randomized longitudinal study, the authors demonstrate that LLM-generated variants maintain reliability and convergent validity with an external criterion (PHQ-8) while reducing perceived repetitiveness versus a standard questionnaire. They also explore whether adding LLM-generated small talk and humor enhances engagement, finding no significant gains over variant-only deliveries. The results suggest that LLM-driven diversification can scale and invigorate longitudinal PRO data collection without compromising data quality, though careful prompt design and safety filtering are essential and the benefits of conversational content are not guaranteed.

Abstract

Standardized, validated questionnaires are vital tools in research and healthcare, offering dependable self-report data. Prior work has revealed that virtual agent-administered questionnaires are almost equivalent to self-administered ones in an electronic form. Despite being an engaging method, repeated use of virtual agent-administered questionnaires in longitudinal or pre-post studies can induce respondent fatigue, impacting data quality via response biases and decreased response rates. We propose using large language models (LLMs) to generate diverse questionnaire versions while retaining good psychometric properties. In a longitudinal study, participants interacted with our agent system and responded daily for two weeks to one of the following questionnaires: a standardized depression questionnaire, question variants generated by LLMs, or question variants accompanied by LLM-generated small talk. The responses were compared to a validated depression questionnaire. Psychometric testing revealed consistent covariation between the external criterion and focal measure administered across the three conditions, demonstrating the reliability and validity of the LLM-generated variants. Participants found that the variants were significantly less repetitive than repeated administrations of the same standardized questionnaire. Our findings highlight the potential of LLM-generated variants to invigorate agent-administered questionnaires and foster engagement and interest, without compromising their validity.
Paper Structure (22 sections, 2 figures, 3 tables)

This paper contains 22 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: A screenshot of the agent waiting for the user to respond after asking a depression questionnaire question. The dialogue response options are displayed at the top right corner of the screen.
  • Figure 2: A workflow diagram of how LLMs were used to generate diverse questions. A simple example is provided.