Table of Contents
Fetching ...

Improving GenIR Systems Based on User Feedback

Qingyao Ai, Zhicheng Dou, Min Zhang

TL;DR

This paper tackles improving GenIR systems through extended user feedback, redefining who the user is in the GenIR era. It surveys strategies to inject feedback into prompts, indexing, fine-tuning, and alignment to match user factors. It analyzes alignment objectives and methods, including RLHF, RLAIF, and RLCF, and optimization techniques PPO, DPO, RRHF, and RAFT, tailored to information access tasks. It also discusses continual learning, conversational learning and ranking, and prompt-learning approaches, and identifies challenges such as user intention understanding, data-efficient feedback, and privacy concerns, guiding future research.

Abstract

In this chapter, we discuss how to improve the GenIR systems based on user feedback. Before describing the approaches, it is necessary to be aware that the concept of "user" has been extended in the interactions with the GenIR systems. Different types of feedback information and strategies are also provided. Then the alignment techniques are highlighted in terms of objectives and methods. Following this, various ways of learning from user feedback in GenIR are presented, including continual learning, learning and ranking in the conversational context, and prompt learning. Through this comprehensive exploration, it becomes evident that innovative techniques are being proposed beyond traditional methods of utilizing user feedback, and contribute significantly to the evolution of GenIR in the new era. We also summarize some challenging topics and future directions that require further investigation.

Improving GenIR Systems Based on User Feedback

TL;DR

This paper tackles improving GenIR systems through extended user feedback, redefining who the user is in the GenIR era. It surveys strategies to inject feedback into prompts, indexing, fine-tuning, and alignment to match user factors. It analyzes alignment objectives and methods, including RLHF, RLAIF, and RLCF, and optimization techniques PPO, DPO, RRHF, and RAFT, tailored to information access tasks. It also discusses continual learning, conversational learning and ranking, and prompt-learning approaches, and identifies challenges such as user intention understanding, data-efficient feedback, and privacy concerns, guiding future research.

Abstract

In this chapter, we discuss how to improve the GenIR systems based on user feedback. Before describing the approaches, it is necessary to be aware that the concept of "user" has been extended in the interactions with the GenIR systems. Different types of feedback information and strategies are also provided. Then the alignment techniques are highlighted in terms of objectives and methods. Following this, various ways of learning from user feedback in GenIR are presented, including continual learning, learning and ranking in the conversational context, and prompt learning. Through this comprehensive exploration, it becomes evident that innovative techniques are being proposed beyond traditional methods of utilizing user feedback, and contribute significantly to the evolution of GenIR in the new era. We also summarize some challenging topics and future directions that require further investigation.
Paper Structure (16 sections, 3 figures, 1 table)

This paper contains 16 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Different types of indexing as prompt input to GenIR systems proposed by geng2022recommendation.
  • Figure 2: Illustrations of LLMs application in document summarization for similar documents provided by dong2023aligning. The distinctive parts of each document are highlighted in different colors.
  • Figure 3: An illustration of example reward collection and optimization methods in LLM alignment.