Table of Contents
Fetching ...

Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models

Yuchen Fan, Yuzhong Hong, Qiushi Wang, Junwei Bao, Hongfei Jiang, Yang Song

TL;DR

PoFT introduces a BT-based preference objective for supervised fine-tuning that explicitly favors the target model over aligned LLMs on the same SFT data, using the aligned models’ predicted likelihoods as a data-quality signal. The method derives a gradient that weights samples by a coefficient dependent on the aligned LLMs’, improving robustness to quality-limited data and reducing overfitting to noisy data. Empirical results across Mistral-7B and Llama-3-8B backbones on UltraChat200k, ShareGPT, and OpenHermes show consistent improvements over standard cross-entropy training, with particularly large gains on OpenHermes and notable stability across epochs. PoFT is compatible with data filtering techniques and can synergize with Direct Preference Optimization, indicating practical utility for building more reliable aligned LLMs in real-world settings.

Abstract

Alignment, endowing a pre-trained Large language model (LLM) with the ability to follow instructions, is crucial for its real-world applications. Conventional supervised fine-tuning (SFT) methods formalize it as causal language modeling typically with a cross-entropy objective, requiring a large amount of high-quality instruction-response pairs. However, the quality of widely used SFT datasets can not be guaranteed due to the high cost and intensive labor for the creation and maintenance in practice. To overcome the limitations associated with the quality of SFT datasets, we introduce a novel \textbf{p}reference-\textbf{o}riented supervised \textbf{f}ine-\textbf{t}uning approach, namely PoFT. The intuition is to boost SFT by imposing a particular preference: \textit{favoring the target model over aligned LLMs on the same SFT data.} This preference encourages the target model to predict a higher likelihood than that predicted by the aligned LLMs, incorporating assessment information on data quality (i.e., predicted likelihood by the aligned LLMs) into the training process. Extensive experiments are conducted, and the results validate the effectiveness of the proposed method. PoFT achieves stable and consistent improvements over the SFT baselines across different training datasets and base models. Moreover, we prove that PoFT can be integrated with existing SFT data filtering methods to achieve better performance, and further improved by following preference optimization procedures, such as DPO.

Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models

TL;DR

PoFT introduces a BT-based preference objective for supervised fine-tuning that explicitly favors the target model over aligned LLMs on the same SFT data, using the aligned models’ predicted likelihoods as a data-quality signal. The method derives a gradient that weights samples by a coefficient dependent on the aligned LLMs’, improving robustness to quality-limited data and reducing overfitting to noisy data. Empirical results across Mistral-7B and Llama-3-8B backbones on UltraChat200k, ShareGPT, and OpenHermes show consistent improvements over standard cross-entropy training, with particularly large gains on OpenHermes and notable stability across epochs. PoFT is compatible with data filtering techniques and can synergize with Direct Preference Optimization, indicating practical utility for building more reliable aligned LLMs in real-world settings.

Abstract

Alignment, endowing a pre-trained Large language model (LLM) with the ability to follow instructions, is crucial for its real-world applications. Conventional supervised fine-tuning (SFT) methods formalize it as causal language modeling typically with a cross-entropy objective, requiring a large amount of high-quality instruction-response pairs. However, the quality of widely used SFT datasets can not be guaranteed due to the high cost and intensive labor for the creation and maintenance in practice. To overcome the limitations associated with the quality of SFT datasets, we introduce a novel \textbf{p}reference-\textbf{o}riented supervised \textbf{f}ine-\textbf{t}uning approach, namely PoFT. The intuition is to boost SFT by imposing a particular preference: \textit{favoring the target model over aligned LLMs on the same SFT data.} This preference encourages the target model to predict a higher likelihood than that predicted by the aligned LLMs, incorporating assessment information on data quality (i.e., predicted likelihood by the aligned LLMs) into the training process. Extensive experiments are conducted, and the results validate the effectiveness of the proposed method. PoFT achieves stable and consistent improvements over the SFT baselines across different training datasets and base models. Moreover, we prove that PoFT can be integrated with existing SFT data filtering methods to achieve better performance, and further improved by following preference optimization procedures, such as DPO.

Paper Structure

This paper contains 24 sections, 14 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: The overall modeling framework of PoFT. By leveraging the Bradley-Terry ranking objective, we impose a particular preference that favors the target model over the aligned LLMs on the same SFT data. Note that the preference score is generated based on the corresponding predicted likelihood.
  • Figure 2: Preference scores generated by aligned LLMs across different training datasets. Note that PDF stands for the probability density function.
  • Figure 3: Analysis and model performances on quality-limited data. (a) Preference score distributions of data-limited datasets - Alpaca and Dolly, compared to OpenHermes. Note that PDF stands for the probability density function. (c) Performances of PoFT and SFT models training with Alpaca. (d) Performances of PoFT and SFT models training with Dolly. (d) Preference score distributions of hand-crafted noise data on OpenHermes. The increase in the long-trail part indicates the distribution of the noise data. (e) Performances of PoFT and SFT models training with hand-crafted noise data.
  • Figure 4: Performance of Mistral-7B trained with different percentages of data on Open LLM Leaderboard.
  • Figure 5: Performances of bi-PoFT models training with hand-crafted noise data (50k/100k) across epochs on Open LLM Leaderboard. For comparison, we present the results of PoFT models trained solely with the original SFT data.