Table of Contents
Fetching ...

Naturally Occurring Feedback is Common, Extractable and Useful

Shachar Don-Yehiya, Leshem Choshen, Omri Abend

TL;DR

This work tackles the high cost and limited scalability of collecting explicit human feedback for aligning large language models. It defines a five-category taxonomy of naturally occurring feedback in user–model conversations, demonstrates that such feedback is prevalent (about 30% of chats) and grows with newer models, and develops a method to extract this feedback automatically from large chat corpora. The authors create the Natural Feedback Dataset by processing over 1 million conversations to yield hundreds of thousands of feedback samples, and show that training with this data improves model alignment, verified through human judgments, open-model evaluations, and GPT-based judging. The results suggest naturally occurring feedback is a valuable, scalable complementary source for feedback data, with implications for more efficient RLHF pipelines and real-time feedback integration.

Abstract

Human feedback data is a critical component in developing language models. However, collecting this feedback is costly and ultimately not scalable. Inspired by the way human interlocutors provide spontaneous unsolicited feedback to each other, we propose to extract feedback that users naturally include when interacting with chat models. We manually annotated conversations to confirm the presence of naturally occurring feedback in a standard corpus, finding that as much as 30% of the chats include explicit feedback. Comparing to older datasets, we find that naturally occurring feedback is more prevalent in recent conversation datasets, suggesting that more than ever, naturally occurring feedback can serve as a valuable resource for feedback data. We propose a method for automatically extracting this feedback, and apply it to over 1M conversations to obtain hundreds of thousands of feedback samples. The extracted feedback shows promise: training with it improves over baseline models and enhances model alignment to human preferences.

Naturally Occurring Feedback is Common, Extractable and Useful

TL;DR

This work tackles the high cost and limited scalability of collecting explicit human feedback for aligning large language models. It defines a five-category taxonomy of naturally occurring feedback in user–model conversations, demonstrates that such feedback is prevalent (about 30% of chats) and grows with newer models, and develops a method to extract this feedback automatically from large chat corpora. The authors create the Natural Feedback Dataset by processing over 1 million conversations to yield hundreds of thousands of feedback samples, and show that training with this data improves model alignment, verified through human judgments, open-model evaluations, and GPT-based judging. The results suggest naturally occurring feedback is a valuable, scalable complementary source for feedback data, with implications for more efficient RLHF pipelines and real-time feedback integration.

Abstract

Human feedback data is a critical component in developing language models. However, collecting this feedback is costly and ultimately not scalable. Inspired by the way human interlocutors provide spontaneous unsolicited feedback to each other, we propose to extract feedback that users naturally include when interacting with chat models. We manually annotated conversations to confirm the presence of naturally occurring feedback in a standard corpus, finding that as much as 30% of the chats include explicit feedback. Comparing to older datasets, we find that naturally occurring feedback is more prevalent in recent conversation datasets, suggesting that more than ever, naturally occurring feedback can serve as a valuable resource for feedback data. We propose a method for automatically extracting this feedback, and apply it to over 1M conversations to obtain hundreds of thousands of feedback samples. The extracted feedback shows promise: training with it improves over baseline models and enhances model alignment to human preferences.
Paper Structure (44 sections, 8 figures, 1 table)

This paper contains 44 sections, 8 figures, 1 table.

Figures (8)

  • Figure 1: Naturally occurring feedback example. The feedback contained in the user responses is highlighted.
  • Figure 2: The distribution of feedback categories for the first $300$ conversations in the dataset, as deemed by manual annotation. The most frequent categories are "Repeat and Rephrase" and "Ask for Clarification". There are only $9$ cases of "Positive Feedback".
  • Figure 3: Extraction Prompt. We describe the taxonomy and ask the model to output the categories and spans of human responses that contain feedback.
  • Figure 4: Confusion Matrix for the Extracted Feedback. Out of the $101$ manually annotated feedback cases, our automatic method managed to find $58$, and to correctly classify to categories $38$. There is no confusion between "Positive Feedback" and the rest of the categories.
  • Figure 5: Automatically extracted feedback distribution. The automatic and manual extraction (Fig. \ref{['fig:manual_categories']}) agree on which categories are more common: "Ask For Clarification" and "Repeat or Rephrase". "Make Aware without Correction" and "Positive Feedback" are the rarest.
  • ...and 3 more figures