Table of Contents
Fetching ...

Leveraging Large Language Models to Power Chatbots for Collecting User Self-Reported Data

Jing Wei, Sungdong Kim, Hyunhoon Jung, Young-Ho Kim

TL;DR

This paper investigates how prompt design for large language models can power chatbots to collect user self-reported health data through natural conversations. It components four prompt designs (Structured/Descriptive × with/without a personality modifier) applied to four health topics, tested in an online study with $N=48$ participants, yielding a slot-filling rate of $79%$. The study shows that prompt format, topic, and conversation path significantly influence both data collection performance and conversational style, including empathy-related behaviors. The findings offer practical guidance for building low-cost, LLM-driven chatbots for personal informatics while outlining ethical considerations, limitations, and avenues for future improvements.

Abstract

Large language models (LLMs) provide a new way to build chatbots by accepting natural language prompts. Yet, it is unclear how to design prompts to power chatbots to carry on naturalistic conversations while pursuing a given goal, such as collecting self-report data from users. We explore what design factors of prompts can help steer chatbots to talk naturally and collect data reliably. To this aim, we formulated four prompt designs with different structures and personas. Through an online study (N = 48) where participants conversed with chatbots driven by different designs of prompts, we assessed how prompt designs and conversation topics affected the conversation flows and users' perceptions of chatbots. Our chatbots covered 79% of the desired information slots during conversations, and the designs of prompts and topics significantly influenced the conversation flows and the data collection performance. We discuss the opportunities and challenges of building chatbots with LLMs.

Leveraging Large Language Models to Power Chatbots for Collecting User Self-Reported Data

TL;DR

This paper investigates how prompt design for large language models can power chatbots to collect user self-reported health data through natural conversations. It components four prompt designs (Structured/Descriptive × with/without a personality modifier) applied to four health topics, tested in an online study with participants, yielding a slot-filling rate of . The study shows that prompt format, topic, and conversation path significantly influence both data collection performance and conversational style, including empathy-related behaviors. The findings offer practical guidance for building low-cost, LLM-driven chatbots for personal informatics while outlining ethical considerations, limitations, and avenues for future improvements.

Abstract

Large language models (LLMs) provide a new way to build chatbots by accepting natural language prompts. Yet, it is unclear how to design prompts to power chatbots to carry on naturalistic conversations while pursuing a given goal, such as collecting self-report data from users. We explore what design factors of prompts can help steer chatbots to talk naturally and collect data reliably. To this aim, we formulated four prompt designs with different structures and personas. Through an online study (N = 48) where participants conversed with chatbots driven by different designs of prompts, we assessed how prompt designs and conversation topics affected the conversation flows and users' perceptions of chatbots. Our chatbots covered 79% of the desired information slots during conversations, and the designs of prompts and topics significantly influenced the conversation flows and the data collection performance. We discuss the opportunities and challenges of building chatbots with LLMs.
Paper Structure (45 sections, 10 figures, 16 tables)

This paper contains 45 sections, 10 figures, 16 tables.

Figures (10)

  • Figure 1: Prompt design combining two factors, information format and personality modifier, in the Food intake topic.
  • Figure 2: 95% confidence intervals of slot filling rate by variables with a significant effect: (a) the combination of the information format and personality modifier represented as study condition; (b) topic; and (c) the conversation path. The asterisks with arms indicate significance between the connected categories. (Refer to Appendix \ref{['appendix:stats:slot']} for model details and statistics.)
  • Figure 3: Breakdowns of the percentage of filled slots by the order of questions for each topic. Work and Exercise consist of four slots.
  • Figure 4: 95% confidence intervals of the turn ratios of RQ (top; a--d) and SQ (bottom; e--h) by variables with a significant effect: The asterisks with arms indicate significance between the connected categories. Note that for (d) and (h) we did not display the significance across topics. (Refer to Appendices \ref{['appendix:stats:rq']} and \ref{['appendix:stats:sq']} for model details and statistics.)
  • Figure 5: 95% confidence intervals of the chatbot turn ratios for Acknowledging (a--d), Appreciating (e--h), and Sympathizing (i--l) behaviors by information format, personality modifier, study condition (combinations of format and personality modifier), and conversation path. Variables that are not significant are marked as 'NS.' (Refer to Appendix \ref{['appendix:stats:empathy']} for model details and statistics.)
  • ...and 5 more figures