Table of Contents
Fetching ...

Data Selection for LLM Alignment Using Fine-Grained Preferences

Jia Zhang, Yao Liu, Chen-Xi Zhang, Yi Liu, Yi-Xuan Jin, Lan-Zhe Guo, Yu-Feng Li

TL;DR

This work forms the problem as a direct fine-grained preference optimization and introduces preference divergence (PD) that quantifies inter-aspect preference conflicts and proposes a simple yet effective strategy, which identifies a subset of data corresponding to the most negative PD values, for efficient training.

Abstract

Large language models (LLMs) alignment aims to ensure that the behavior of LLMs meets human preferences. While collecting data from multiple fine-grained, aspect-specific preferences becomes more and more feasible, existing alignment methods typically work on a single preference and thus struggle with conflicts inherent in such aggregated datasets. As one early attempt, in this paper, we propose a data-centric approach to align LLMs through the effective use of fine-grained preferences. Specifically, we formulate the problem as a direct fine-grained preference optimization and introduce preference divergence (PD) that quantifies inter-aspect preference conflicts. Instead of directly tackling the consequent complicated optimization, we recast it as a data selection problem and propose a simple yet effective strategy, which identifies a subset of data corresponding to the most negative PD values, for efficient training. We theoretically analyze the loss-bound optimality of our selection strategy and conduct extensive empirical studies on varied settings and datasets to demonstrate that our practical selection method could achieve consistent improvement against standard full-data alignment, using even just 30% of the data. Our work shares a line that LLM alignment using fine-grained preferences is highly feasible.

Data Selection for LLM Alignment Using Fine-Grained Preferences

TL;DR

This work forms the problem as a direct fine-grained preference optimization and introduces preference divergence (PD) that quantifies inter-aspect preference conflicts and proposes a simple yet effective strategy, which identifies a subset of data corresponding to the most negative PD values, for efficient training.

Abstract

Large language models (LLMs) alignment aims to ensure that the behavior of LLMs meets human preferences. While collecting data from multiple fine-grained, aspect-specific preferences becomes more and more feasible, existing alignment methods typically work on a single preference and thus struggle with conflicts inherent in such aggregated datasets. As one early attempt, in this paper, we propose a data-centric approach to align LLMs through the effective use of fine-grained preferences. Specifically, we formulate the problem as a direct fine-grained preference optimization and introduce preference divergence (PD) that quantifies inter-aspect preference conflicts. Instead of directly tackling the consequent complicated optimization, we recast it as a data selection problem and propose a simple yet effective strategy, which identifies a subset of data corresponding to the most negative PD values, for efficient training. We theoretically analyze the loss-bound optimality of our selection strategy and conduct extensive empirical studies on varied settings and datasets to demonstrate that our practical selection method could achieve consistent improvement against standard full-data alignment, using even just 30% of the data. Our work shares a line that LLM alignment using fine-grained preferences is highly feasible.

Paper Structure

This paper contains 84 sections, 48 equations, 17 figures, 14 tables, 1 algorithm.

Figures (17)

  • Figure 1: Conflicts between the fine-grained and overall preferences commonly occur, and only a part of the samples show complete consistency across all fine-grained aspects.
  • Figure 2: The overall workflow of the proposed PD selection method.
  • Figure 3: Performance variation with different selection budgets for settings in § 5.2.
  • Figure 4: Performance variation with different selection budgets for settings in § 5.3.
  • Figure 5: Comparison of different proxy reward models.
  • ...and 12 more figures