Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment

Feifan Song; Bowen Yu; Hao Lang; Haiyang Yu; Fei Huang; Houfeng Wang; Yongbin Li

Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment

Feifan Song, Bowen Yu, Hao Lang, Haiyang Yu, Fei Huang, Houfeng Wang, Yongbin Li

TL;DR

This paper investigates how to allocate a fixed annotation budget between prompts and responses for fine-tuning LLMs to align with human preferences. It introduces an $N$-gram based prompt-diversity metric and demonstrates a linear relationship between this diversity and final performance, while showing that increasing response diversity yields larger gains than increasing prompt diversity. The authors validate these findings via automatic reward-model evaluations and GPT-4 judgments, and further show that diversity-guided data augmentation can safely boost alignment performance under budget constraints. Collectively, the work provides practical guidance for data-centric LLM alignment, offering a concrete diversity metric and augmentation strategy to maximize gains from limited human feedback.

Abstract

Alignment with human preference prevents large language models (LLMs) from generating misleading or toxic content while requiring high-cost human feedback. Assuming resources of human annotation are limited, there are two different ways of allocating considered: more diverse PROMPTS or more diverse RESPONSES to be labeled. Nonetheless, a straightforward comparison between their impact is absent. In this work, we first control the diversity of both sides according to the number of samples for fine-tuning, which can directly reflect their influence. We find that instead of numerous prompts, more responses but fewer prompts better trigger LLMs for human alignment. Additionally, the concept of diversity for prompts can be more complex than responses that are typically quantified by single digits. Consequently, a new formulation of prompt diversity is proposed, further implying a linear correlation with the final performance of LLMs after fine-tuning. We also leverage it on data augmentation and conduct experiments to show its effect on different algorithms.

Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment

TL;DR

This paper investigates how to allocate a fixed annotation budget between prompts and responses for fine-tuning LLMs to align with human preferences. It introduces an

-gram based prompt-diversity metric and demonstrates a linear relationship between this diversity and final performance, while showing that increasing response diversity yields larger gains than increasing prompt diversity. The authors validate these findings via automatic reward-model evaluations and GPT-4 judgments, and further show that diversity-guided data augmentation can safely boost alignment performance under budget constraints. Collectively, the work provides practical guidance for data-centric LLM alignment, offering a concrete diversity metric and augmentation strategy to maximize gains from limited human feedback.

Abstract

Paper Structure (24 sections, 10 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 24 sections, 10 equations, 4 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Fine-tuning for Human Alignment
Scaling Analyses of LLMs
Quantitative Experiments
Background
Dataset Construction
Metrics
Benchmark Algorithms
Implementation Details
Results of Automatic Evaluation
GPT-4 Evaluation
The Scaling Law between Prompt Diversity and LLMs Preference
Diversity Formulation
Analysis
...and 9 more sections

Figures (4)

Figure 1: Different directions of data expansion for human alignment: (1) Expanding more prompts; (2) Expanding more responses for each prompt.
Figure 2: Distribution of BLEU scores with different settings.
Figure 3: GPT-4 Evaluation
Figure 4: (a) Linear fitting from different sample amounts to finally acquired rewards of LLMs tuned with PRO. (b) The trend of diversity with the increasing sample amount. (c) Linear fitting from the proposed diversity metric to finally acquired rewards of LLMs tuned with PRO. (d) Linear fitting from the proposed diversity metric to finally acquired rewards of LLMs tuned with SFT.

Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment

TL;DR

Abstract

Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment

Authors

TL;DR

Abstract

Table of Contents

Figures (4)