Table of Contents
Fetching ...

"You tell me": A Dataset of GPT-4-Based Behaviour Change Support Conversations

Selina Meyer, David Elsweiler

TL;DR

This work addresses the gap in understanding user behavior in counselling-style, LLM-based dialogue by releasing a GPT-4-based dataset that contrasts MI-adapted and non-MI prompting across 12-turn sessions for three target behaviours. The study uses a preregistered online design with 164 German-speaking participants, 185 chats, and 2149 turns, collecting pre/post measures and per-turn ratings to examine user-system interactions. Its contributions include multilingual data with rich behavioral and perception metrics, plus an MI-prompting framework to study controllability and efficacy of LLM-based behavior-change support. The dataset enables analysis of user expectations, information needs, and the impact of MI-adapted prompts on engagement and readiness to change, informing safer and more effective design of social-influence conversational agents.

Abstract

Conversational agents are increasingly used to address emotional needs on top of information needs. One use case of increasing interest are counselling-style mental health and behaviour change interventions, with large language model (LLM)-based approaches becoming more popular. Research in this context so far has been largely system-focused, foregoing the aspect of user behaviour and the impact this can have on LLM-generated texts. To address this issue, we share a dataset containing text-based user interactions related to behaviour change with two GPT-4-based conversational agents collected in a preregistered user study. This dataset includes conversation data, user language analysis, perception measures, and user feedback for LLM-generated turns, and can offer valuable insights to inform the design of such systems based on real interactions.

"You tell me": A Dataset of GPT-4-Based Behaviour Change Support Conversations

TL;DR

This work addresses the gap in understanding user behavior in counselling-style, LLM-based dialogue by releasing a GPT-4-based dataset that contrasts MI-adapted and non-MI prompting across 12-turn sessions for three target behaviours. The study uses a preregistered online design with 164 German-speaking participants, 185 chats, and 2149 turns, collecting pre/post measures and per-turn ratings to examine user-system interactions. Its contributions include multilingual data with rich behavioral and perception metrics, plus an MI-prompting framework to study controllability and efficacy of LLM-based behavior-change support. The dataset enables analysis of user expectations, information needs, and the impact of MI-adapted prompts on engagement and readiness to change, informing safer and more effective design of social-influence conversational agents.

Abstract

Conversational agents are increasingly used to address emotional needs on top of information needs. One use case of increasing interest are counselling-style mental health and behaviour change interventions, with large language model (LLM)-based approaches becoming more popular. Research in this context so far has been largely system-focused, foregoing the aspect of user behaviour and the impact this can have on LLM-generated texts. To address this issue, we share a dataset containing text-based user interactions related to behaviour change with two GPT-4-based conversational agents collected in a preregistered user study. This dataset includes conversation data, user language analysis, perception measures, and user feedback for LLM-generated turns, and can offer valuable insights to inform the design of such systems based on real interactions.
Paper Structure (12 sections, 1 figure, 2 tables)

This paper contains 12 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Each user interacts with one of two systems, where one system is prompted to adhere to Motivational Interviewing principles. Users interact with the systems for 12 turns. User turns are classified with respect to their implications regarding motivation for behaviour change. Each GPT-generated bot turn is rated as helpful, unhelpful, or harmful by the user, with an optional rating explanation.