Table of Contents
Fetching ...

Towards a Unified View of Preference Learning for Large Language Models: A Survey

Bofei Gao, Feifan Song, Yibo Miao, Zefan Cai, Zhe Yang, Liang Chen, Helan Hu, Runxin Xu, Qingxiu Dong, Ce Zheng, Shanghaoran Quan, Wen Xiao, Ge Zhang, Daoguang Zan, Keming Lu, Bowen Yu, Dayiheng Liu, Zeyu Cui, Jian Yang, Lei Sha, Houfeng Wang, Zhifang Sui, Peiyi Wang, Tianyu Liu, Baobao Chang

TL;DR

This survey decomposes all the strategies in preference learning into four components: model, data, feedback, and algorithm, which offers an in-depth understanding of existing alignment algorithms and opens up possibilities to synergize the strengths of different strategies.

Abstract

Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. This alignment process often requires only a small amount of data to efficiently enhance the LLM's performance. While effective, research in this area spans multiple domains, and the methods involved are relatively complex to understand. The relationships between different methods have been under-explored, limiting the development of the preference alignment. In light of this, we break down the existing popular alignment strategies into different components and provide a unified framework to study the current alignment strategies, thereby establishing connections among them. In this survey, we decompose all the strategies in preference learning into four components: model, data, feedback, and algorithm. This unified view offers an in-depth understanding of existing alignment algorithms and also opens up possibilities to synergize the strengths of different strategies. Furthermore, we present detailed working examples of prevalent existing algorithms to facilitate a comprehensive understanding for the readers. Finally, based on our unified perspective, we explore the challenges and future research directions for aligning large language models with human preferences.

Towards a Unified View of Preference Learning for Large Language Models: A Survey

TL;DR

This survey decomposes all the strategies in preference learning into four components: model, data, feedback, and algorithm, which offers an in-depth understanding of existing alignment algorithms and opens up possibilities to synergize the strengths of different strategies.

Abstract

Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. This alignment process often requires only a small amount of data to efficiently enhance the LLM's performance. While effective, research in this area spans multiple domains, and the methods involved are relatively complex to understand. The relationships between different methods have been under-explored, limiting the development of the preference alignment. In light of this, we break down the existing popular alignment strategies into different components and provide a unified framework to study the current alignment strategies, thereby establishing connections among them. In this survey, we decompose all the strategies in preference learning into four components: model, data, feedback, and algorithm. This unified view offers an in-depth understanding of existing alignment algorithms and also opens up possibilities to synergize the strengths of different strategies. Furthermore, we present detailed working examples of prevalent existing algorithms to facilitate a comprehensive understanding for the readers. Finally, based on our unified perspective, we explore the challenges and future research directions for aligning large language models with human preferences.
Paper Structure (52 sections, 8 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 52 sections, 8 equations, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: A unified view and an illustrative example of preference learning for LLMs.
  • Figure 2: Taxonomy of Preference Learning.
  • Figure 3: Examples of the preference learning. Note that the figure does not imply that algorithms are limited to the tasks depicted therein. Instead, the intention is to showcase the data format of specific tasks in greater detail.
  • Figure 4: Examples of the preference learning strategies with point-wise loss. Similar to the Figure \ref{['fig:running_examples']}, different methods can be adapted to different tasks.
  • Figure 5: The overview of the preference learning. For an LLM $\pi_\theta$ to be aligned with human preferences, first we need to prepare preference data. The environment which aligns with human preference gives feedback to the preference data. Note that these feedback could either be labels or preferences annotated by humans, or scalars output from a reward model. By feeding the model, data, and feedback to a specific algorithm, we obtain a LLM $\pi_\theta'$ that is aligned with human preferences.
  • ...and 1 more figures