How Does Conversation Length Impact User's Satisfaction? A Case Study of Length-Controlled Conversations with LLM-Powered Chatbots

Shih-Hong Huang; Ya-Fang Lin; Zeyu He; Chieh-Yang Huang; Ting-Hao 'Kenneth' Huang

How Does Conversation Length Impact User's Satisfaction? A Case Study of Length-Controlled Conversations with LLM-Powered Chatbots

Shih-Hong Huang, Ya-Fang Lin, Zeyu He, Chieh-Yang Huang, Ting-Hao 'Kenneth' Huang

TL;DR

This study investigates how controlling conversation length affects user satisfaction with LLM-powered chatbots. By building two Slackbots (SlackVanilla and MultiSlack) with adjustable follow-up turns and testing them on 40 questions divided by Conversability, the authors quantify effects via self-reports and MTurk evaluations. Results indicate that longer conversations can boost satisfaction and perceived helpfulness for high-conversability questions, but benefits are not universal and longer turns can introduce repetition or ambiguity. The work highlights the potential and limits of length-adaptive conversational agents and suggests adaptive strategies guided by user context, with future avenues for automatic situational awareness and privacy-aware designs.

Abstract

Users can discuss a wide range of topics with large language models (LLMs), but they do not always prefer solving problems or getting information through lengthy conversations. This raises an intriguing HCI question: How does instructing LLMs to engage in longer or shorter conversations affect conversation quality? In this paper, we developed two Slack chatbots using GPT-4 with the ability to vary conversation lengths and conducted a user study. Participants asked the chatbots both highly and less conversable questions, engaging in dialogues with 0, 3, 5, and 7 conversational turns. We found that the conversation quality does not differ drastically across different conditions, while participants had mixed reactions. Our study demonstrates LLMs' ability to change conversation length and the potential benefits for users resulting from such changes, but we caution that changes in text form may not necessarily imply changes in quality or content.

How Does Conversation Length Impact User's Satisfaction? A Case Study of Length-Controlled Conversations with LLM-Powered Chatbots

TL;DR

Abstract

Paper Structure (28 sections, 3 figures, 6 tables)

This paper contains 28 sections, 3 figures, 6 tables.

Introduction
Backgrounds
User Study
Study Design.
Configuring the Slackbots.
Study Procedure.
Participants.
Findings
Participant Self-Reported Response Analysis
As the conversation length increased, satisfaction levels for high-conversability questions also rose.
The helpfulness of responses to high-conversability questions increased with increasing conversation length.
Participants may believe high-conversability questions necessitate more questions from the assistant.
MultiSlack was preferred over SlackVanilla but with varying opinions from participants.
MTurk Response Analysis
Longer conversations may not be inherently better for high-conversability questions.
...and 13 more sections

Figures (3)

Figure 1: Rating distribution of the participant self-reported responses. Rating pair that passed the Chi-squared Test is denoted by $\diamondsuit$ (p-value = 0.05).
Figure 2: Rating distribution of the MTurk evaluation. Rating pairs that passed the Chi-squared Test are denoted by $\dagger$ (p-value = 0.046), $\bigstar$ (p-value = 0.026), $\diamondsuit$ (p-value = 0.041), $\clubsuit$ (p-value = 0.013), $\heartsuit$ (p-value = 0.01), $\spadesuit$ (p-value = 0.022), $\bigcirc$ (p-value = 0.015), # (p-value = 0.009), $\blacksquare$ (p-value = 0.032), and $\triangle$ (p-value = 0.005).
Figure 3: Interface for MTurk workers.

How Does Conversation Length Impact User's Satisfaction? A Case Study of Length-Controlled Conversations with LLM-Powered Chatbots

TL;DR

Abstract

How Does Conversation Length Impact User's Satisfaction? A Case Study of Length-Controlled Conversations with LLM-Powered Chatbots

Authors

TL;DR

Abstract

Table of Contents

Figures (3)