Table of Contents
Fetching ...

Understanding Users' Dissatisfaction with ChatGPT Responses: Types, Resolving Tactics, and the Effect of Knowledge Level

Yoonsu Kim, Jueon Lee, Seoyoung Kim, Jaehyuk Park, Juho Kim

TL;DR

Examining users’ dissatisfaction with LLM finds that users with low knowledge of LLMs tend to face more dissatisfaction on accuracy while they often put minimal effort in addressing dissatisfaction, and proposes design implications for minimizing user dissatisfaction and enhancing the usability of chat-based LLM.

Abstract

Large language models (LLMs) with chat-based capabilities, such as ChatGPT, are widely used in various workflows. However, due to a limited understanding of these large-scale models, users struggle to use this technology and experience different kinds of dissatisfaction. Researchers have introduced several methods, such as prompt engineering, to improve model responses. However, they focus on enhancing the model's performance in specific tasks, and little has been investigated on how to deal with the user dissatisfaction resulting from the model's responses. Therefore, with ChatGPT as the case study, we examine users' dissatisfaction along with their strategies to address the dissatisfaction. After organizing users' dissatisfaction with LLM into seven categories based on a literature review, we collected 511 instances of dissatisfactory ChatGPT responses from 107 users and their detailed recollections of dissatisfactory experiences, which we released as a publicly accessible dataset. Our analysis reveals that users most frequently experience dissatisfaction when ChatGPT fails to grasp their intentions, while they rate the severity of dissatisfaction related to accuracy the highest. We also identified four tactics users employ to address their dissatisfaction and their effectiveness. We found that users often do not use any tactics to address their dissatisfaction, and even when using tactics, 72% of dissatisfaction remained unresolved. Moreover, we found that users with low knowledge of LLMs tend to face more dissatisfaction on accuracy while they often put minimal effort in addressing dissatisfaction. Based on these findings, we propose design implications for minimizing user dissatisfaction and enhancing the usability of chat-based LLM.

Understanding Users' Dissatisfaction with ChatGPT Responses: Types, Resolving Tactics, and the Effect of Knowledge Level

TL;DR

Examining users’ dissatisfaction with LLM finds that users with low knowledge of LLMs tend to face more dissatisfaction on accuracy while they often put minimal effort in addressing dissatisfaction, and proposes design implications for minimizing user dissatisfaction and enhancing the usability of chat-based LLM.

Abstract

Large language models (LLMs) with chat-based capabilities, such as ChatGPT, are widely used in various workflows. However, due to a limited understanding of these large-scale models, users struggle to use this technology and experience different kinds of dissatisfaction. Researchers have introduced several methods, such as prompt engineering, to improve model responses. However, they focus on enhancing the model's performance in specific tasks, and little has been investigated on how to deal with the user dissatisfaction resulting from the model's responses. Therefore, with ChatGPT as the case study, we examine users' dissatisfaction along with their strategies to address the dissatisfaction. After organizing users' dissatisfaction with LLM into seven categories based on a literature review, we collected 511 instances of dissatisfactory ChatGPT responses from 107 users and their detailed recollections of dissatisfactory experiences, which we released as a publicly accessible dataset. Our analysis reveals that users most frequently experience dissatisfaction when ChatGPT fails to grasp their intentions, while they rate the severity of dissatisfaction related to accuracy the highest. We also identified four tactics users employ to address their dissatisfaction and their effectiveness. We found that users often do not use any tactics to address their dissatisfaction, and even when using tactics, 72% of dissatisfaction remained unresolved. Moreover, we found that users with low knowledge of LLMs tend to face more dissatisfaction on accuracy while they often put minimal effort in addressing dissatisfaction. Based on these findings, we propose design implications for minimizing user dissatisfaction and enhancing the usability of chat-based LLM.
Paper Structure (41 sections, 7 figures, 7 tables)

This paper contains 41 sections, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Overview of our research questions and findings.
  • Figure 2: Screenshot of the data collection system.
  • Figure 3: Normalized Co-occurrence matrix of dissatisfaction category. The value at (i, j) in this matrix represents the frequency of when the ith row was selected as a dissatisfaction point, the jth column was also selected as a dissatisfaction.
  • Figure 4: (a) Distribution of tactic categories by dissatisfaction category. (b) Sankey diagram to visualize how users respond among four tactic categories or No Tactic after experiencing each of the dissatisfaction categories. Note that the count in the Sankey diagram can be greater than the count of response-level analysis in Table \ref{['tab:dis']} and \ref{['tab:tactic_analysis']}. This is because one response can include multiple dissatisfaction categories and multiple tactic categories, and they were counted multiple times to draw a Sankey diagram.
  • Figure 5: (a) A Sankey diagram that visualizes whether users resolved their dissatisfaction using each of the tactic categories. (b) The overall visualization of how users respond among the four tactic categories after experiencing each of the dissatisfaction categories and finally whether that dissatisfaction was solved or not.
  • ...and 2 more figures