Table of Contents
Fetching ...

User Intent Recognition and Satisfaction with Large Language Models: A User Study with ChatGPT

Anna Bodonhelyi, Efe Bozkir, Shuo Yang, Enkelejda Kasneci, Gjergji Kasneci

TL;DR

GPT-4 outperforms GPT-3.5 in recognizing common intents but is often outperformed by GPT-3.5 in recognizing less frequent intents, and whenever the user intent is correctly recognized, users are more satisfied with the intent-based reformulations of GPT-4 compared to GPT-3.5.

Abstract

The rapid evolution of LLMs represents an impactful paradigm shift in digital interaction and content engagement. While they encode vast amounts of human-generated knowledge and excel in processing diverse data types, they often face the challenge of accurately responding to specific user intents, leading to user dissatisfaction. Based on a fine-grained intent taxonomy and intent-based prompt reformulations, we analyze the quality of intent recognition and user satisfaction with answers from intent-based prompt reformulations of GPT-3.5 Turbo and GPT-4 Turbo models. Our study highlights the importance of human-AI interaction and underscores the need for interdisciplinary approaches to improve conversational AI systems. We show that GPT-4 outperforms GPT-3.5 in recognizing common intents but is often outperformed by GPT-3.5 in recognizing less frequent intents. Moreover, whenever the user intent is correctly recognized, while users are more satisfied with the intent-based reformulations of GPT-4 compared to GPT-3.5, they tend to be more satisfied with the models' answers to their original prompts compared to the reformulated ones. The collected data from our study has been made publicly available on GitHub (https://github.com/ConcealedIDentity/UserIntentStudy) for further research.

User Intent Recognition and Satisfaction with Large Language Models: A User Study with ChatGPT

TL;DR

GPT-4 outperforms GPT-3.5 in recognizing common intents but is often outperformed by GPT-3.5 in recognizing less frequent intents, and whenever the user intent is correctly recognized, users are more satisfied with the intent-based reformulations of GPT-4 compared to GPT-3.5.

Abstract

The rapid evolution of LLMs represents an impactful paradigm shift in digital interaction and content engagement. While they encode vast amounts of human-generated knowledge and excel in processing diverse data types, they often face the challenge of accurately responding to specific user intents, leading to user dissatisfaction. Based on a fine-grained intent taxonomy and intent-based prompt reformulations, we analyze the quality of intent recognition and user satisfaction with answers from intent-based prompt reformulations of GPT-3.5 Turbo and GPT-4 Turbo models. Our study highlights the importance of human-AI interaction and underscores the need for interdisciplinary approaches to improve conversational AI systems. We show that GPT-4 outperforms GPT-3.5 in recognizing common intents but is often outperformed by GPT-3.5 in recognizing less frequent intents. Moreover, whenever the user intent is correctly recognized, while users are more satisfied with the intent-based reformulations of GPT-4 compared to GPT-3.5, they tend to be more satisfied with the models' answers to their original prompts compared to the reformulated ones. The collected data from our study has been made publicly available on GitHub (https://github.com/ConcealedIDentity/UserIntentStudy) for further research.
Paper Structure (26 sections, 12 figures, 10 tables, 1 algorithm)

This paper contains 26 sections, 12 figures, 10 tables, 1 algorithm.

Figures (12)

  • Figure 1: Overview of the proposed prompt reformulation framework, visual summary of Algorithm \ref{['alg:procedure']}.
  • Figure 2: Details about the exact intent classification, represented in confusion matrices.
  • Figure 3: Data distribution corresponding to the fine-granular intent categories.
  • Figure 4: Example of the first part of the study
  • Figure 5: Example of intent prediction. For the participants, the complete fine-grained intent categories were provided.
  • ...and 7 more figures