Table of Contents
Fetching ...

User Feedback in Human-LLM Dialogues: A Lens to Understand Users But Noisy as a Learning Signal

Yuhan Liu, Michael J. Q. Zhang, Eunsol Choi

TL;DR

This work systematically investigates implicit user feedback in real-world human–LLM dialogues, formalizing multi-turn interactions and feedback ontologies while building dense, manually annotated datasets on LMSYS and WildChat. It analyzes when feedback arises, its linguistic characteristics, and its potential as a learning signal, finding that toxicity and prompt quality intricately influence feedback patterns. The authors then explore regenerating model outputs using feedback semantics and train LLMs on regenerated data, revealing that strong LLMs can help weaker models and yield gains on MTBench, but results on more complex, real-world benchmarks (WildBench) are mixed. The findings underscore both the promise and the challenges of leveraging implicit, noisy user feedback for scalable alignment in deployed systems, highlighting the need for careful data, model strength, and task complexity considerations.

Abstract

Once language models (LMs) are deployed, they can interact with users long-term, ideally evolving based on their feedback. Asking for direct user feedback can be disruptive; thus, we study harvesting implicit user feedback from user-LM interaction logs. We study two user-LM interaction datasets (WildChat and LMSYS). First, we analyze user feedback in the user-LLM conversation logs, providing insights into when and why such feedback occurs. Second, we study harvesting learning signals from such implicit user feedback. Specifically, we study whether incorporating the contents of user feedback (e.g., user wanted clarification), in addition to the polarity of the feedback, can improve the model performance. We observe mixed results, showing this helps in short human-designed questions (MTBench) but not on longer and more complex questions (WildBench). Together, we provide an in-depth study of implicit user feedback, showing its potential and limitations.

User Feedback in Human-LLM Dialogues: A Lens to Understand Users But Noisy as a Learning Signal

TL;DR

This work systematically investigates implicit user feedback in real-world human–LLM dialogues, formalizing multi-turn interactions and feedback ontologies while building dense, manually annotated datasets on LMSYS and WildChat. It analyzes when feedback arises, its linguistic characteristics, and its potential as a learning signal, finding that toxicity and prompt quality intricately influence feedback patterns. The authors then explore regenerating model outputs using feedback semantics and train LLMs on regenerated data, revealing that strong LLMs can help weaker models and yield gains on MTBench, but results on more complex, real-world benchmarks (WildBench) are mixed. The findings underscore both the promise and the challenges of leveraging implicit, noisy user feedback for scalable alignment in deployed systems, highlighting the need for careful data, model strength, and task complexity considerations.

Abstract

Once language models (LMs) are deployed, they can interact with users long-term, ideally evolving based on their feedback. Asking for direct user feedback can be disruptive; thus, we study harvesting implicit user feedback from user-LM interaction logs. We study two user-LM interaction datasets (WildChat and LMSYS). First, we analyze user feedback in the user-LLM conversation logs, providing insights into when and why such feedback occurs. Second, we study harvesting learning signals from such implicit user feedback. Specifically, we study whether incorporating the contents of user feedback (e.g., user wanted clarification), in addition to the polarity of the feedback, can improve the model performance. We observe mixed results, showing this helps in short human-designed questions (MTBench) but not on longer and more complex questions (WildBench). Together, we provide an in-depth study of implicit user feedback, showing its potential and limitations.

Paper Structure

This paper contains 50 sections, 1 equation, 6 figures, 16 tables.

Figures (6)

  • Figure 1: Approaches to improve model responses that elicited user negative feedback. New model response generated incorporating such feedback content ($\mathbf{m_i^{sem}}$, bottom right) can align better with the user's intended output than the new model response generated with the initial user input alone ($\mathbf{m_i^{scr}}$, top right).
  • Figure 2: Turn-level distribution over feedback categories from our new densely annotated dataset. We find feedback is commonly found in later turns.
  • Figure 3: Comparison of toxicity level between random user prompts and prompts that trigger positive/negative feedback. In both datasets, the toxicity is slightly higher for responses that elicit positive feedback.
  • Figure 4: Comparison of the quality of randomly sampled user prompts and the quality of prompts that incurred positive/negative feedback (N=1000). In LMSYS, prompts that incur negative or positive feedback are slightly worse than randomly sampled prompts.
  • Figure 5: A real user case from existing interaction logs, where the user provides positive feedback upon model's jailbreaking responses.
  • ...and 1 more figures