Understanding User Experience in Large Language Model Interactions

Jiayin Wang; Weizhi Ma; Peijie Sun; Min Zhang; Jian-Yun Nie

Understanding User Experience in Large Language Model Interactions

Jiayin Wang, Weizhi Ma, Peijie Sun, Min Zhang, Jian-Yun Nie

TL;DR

This paper reframes LLM evaluation from model-centric benchmarks to user-centric human-AI collaboration, developing a seven-intent taxonomy for general LLM interfaces grounded in literature, real-world logs, and a 411-participant survey. It reveals distinct usage patterns, satisfaction levels, and concerns across languages, highlighting the prominence of Text Assistant, Information Retrieval, and Problem-Solving intents while emphasizing underexplored subjective uses like Seek Creativity and Advice. The study identifies 11 empirical insights and outlines six future directions, including personalization, tool integration, and cross-linguistic development, to better align LLMs with real-world human needs. The work advocates user-centered evaluation and design as essential for building resonant, trustworthy, and practically impactful LLM services across diverse use cases and cultures.

Abstract

In the rapidly evolving landscape of large language models (LLMs), most research has primarily viewed them as independent individuals, focusing on assessing their capabilities through standardized benchmarks and enhancing their general intelligence. This perspective, however, tends to overlook the vital role of LLMs as user-centric services in human-AI collaboration. This gap in research becomes increasingly critical as LLMs become more integrated into people's everyday and professional interactions. This study addresses the important need to understand user satisfaction with LLMs by exploring four key aspects: comprehending user intents, scrutinizing user experiences, addressing major user concerns about current LLM services, and charting future research paths to bolster human-AI collaborations. Our study develops a taxonomy of 7 user intents in LLM interactions, grounded in analysis of real-world user interaction logs and human verification. Subsequently, we conduct a user survey to gauge their satisfaction with LLM services, encompassing usage frequency, experiences across intents, and predominant concerns. This survey, compiling 411 anonymous responses, uncovers 11 first-hand insights into the current state of user engagement with LLMs. Based on this empirical analysis, we pinpoint 6 future research directions prioritizing the user perspective in LLM developments. This user-centered approach is essential for crafting LLMs that are not just technologically advanced but also resonate with the intricate realities of human interactions and real-world applications.

Understanding User Experience in Large Language Model Interactions

TL;DR

Abstract

Paper Structure (38 sections, 11 figures, 1 table)

This paper contains 38 sections, 11 figures, 1 table.

Introduction
Related Work
User Intent Analysis
Evaluation of Large Language Models
Empirical Studies on human-AI Collaborations
Real-world User Intents for Engaging with Large Language Models (RQ 1)
Taxonomy Development
Step 1: Generation Based on Related Literature
Step 2: Validation through Real-World Logs
Step 3: Testing via User Survey
Classification result
Design of User Study
Questionnaires
Participants
Results on User Engagement with LLMs (RQ 2 and 3)
...and 23 more sections

Figures (11)

Figure 1: In this work, we (1) propose the taxonomy of user intents when engaging with large language model interfaces, (2) design and conduct a survey to understand user satisfaction with current LLMs, (3) conclude 11 insightful findings on usage frequency, user experience, and concerns with LLMs, (4) discuss 6 research directions for future human-AI collaboration studies.
Figure 2: User Intent Taxonomy.
Figure 3: Usage Frequency of the LLM-powered interfaces. Results show that a great number of users interact with large language models on a daily basis.
Figure 4: User Intent Distribution: the percentage of users who reported using LLMs under each intent. The intents are ranked from top to bottom according to their frequency of usage in the Chinese questionnaire.
Figure 5: Pairwise Relationships between Intents: we execute a chi-square test to scrutinize the interdependence of user engagement with each intent. Pairs exhibiting a p-value below 0.05 were identified, signifying a statistically significant correlation. This analytical approach reveals the seven intents distributed across three clusters: Objective Usage through GUIs, Subjective Usage through GUIs, and Usage through APIs.
...and 6 more figures

Understanding User Experience in Large Language Model Interactions

TL;DR

Abstract

Understanding User Experience in Large Language Model Interactions

Authors

TL;DR

Abstract

Table of Contents

Figures (11)