A Practical Analysis of Human Alignment with *PO

Kian Ahrabian; Xihui Lin; Barun Patra; Vishrav Chaudhary; Alon Benhaim; Jay Pujara; Xia Song

A Practical Analysis of Human Alignment with *PO

Kian Ahrabian, Xihui Lin, Barun Patra, Vishrav Chaudhary, Alon Benhaim, Jay Pujara, Xia Song

TL;DR

The paper scrutinizes the robustness of offline preference optimization methods for human alignment under realistic distribution shifts, comparing reference-free SimPO with reference-dependent DPO and LN-DPO while introducing LN-DPO. It conducts a comprehensive OOD evaluation using SafeRLHF and HH-RLHF datasets with the OpenAssistant reward model, detailing objective formulations and a structured training regimen. The results show that, at their best, SimPO, LN-DPO, and DPO are similar, but their performance trajectories differ as hyperparameters vary; LN-DPO notably reduces response length and $KL$ divergence relative to SFT, enhancing stability. Practically, SimPO is a strong default due to robustness and length reduction, while LN-DPO serves as a solid alternative when reference-based regularization or stability is a priority, offering guidance for practitioners with limited hyperparameter search budgets.

Abstract

At the forefront of state-of-the-art human alignment methods are preference optimization methods (*PO). Prior research has often concentrated on identifying the best-performing method, typically involving a grid search over hyperparameters, which can be impractical for general practitioners. In this paper, we examine the robustness of existing state-of-the-art methods to varying hyperparameters in a realistic out-of-distribution (OOD) scenario that mirrors real-world applications of human alignment. Our goal is to empirically find the method that increases the likelihood of achieving better results through the lens of various metrics, such as KL divergence and response length. We also introduce LN-DPO, a simple length-normalized version of DPO that is more stable across hyperparameters, effectively reduces the average response length, and improves performance. Our analysis of state-of-the-art reference-free (i.e., SimPO) and reference-dependent (i.e., DPO and LN-DPO) methods reveals that they perform similarly at their peak (i.e., best possible scenario). However, we uncover that the pattern of change in performance greatly varies as we move away from the best possible scenario.

A Practical Analysis of Human Alignment with *PO

TL;DR

divergence relative to SFT, enhancing stability. Practically, SimPO is a strong default due to robustness and length reduction, while LN-DPO serves as a solid alternative when reference-based regularization or stability is a priority, offering guidance for practitioners with limited hyperparameter search budgets.

Abstract

Paper Structure (27 sections, 1 equation, 7 figures, 4 tables)

This paper contains 27 sections, 1 equation, 7 figures, 4 tables.

Introduction
Related Work
Experimental Setup
Datasets
Models
Optimization Objectives
Connection between LN-DPO and SimPO
Training Regimen
Metrics
Implementation Details
Experimental Results
Hyperparameter Robustness
Best Performance.
Head-to-head Performance.
Expected Performance.
...and 12 more sections

Figures (7)

Figure 1: *PO Performance Distribution. Each sample in the distribution represents the performance of one set of hyperparameters on the denoted metric. The dashed line indicates the performance of the initial SFT model.
Figure 2: Response Length. The top k% ($k \in \{1,10,25\}$) denotes the percentage of best-performing hyperparameters taken from each method's runs.
Figure 3: KL Divergence. The top k% ($k \in \{1,10,25\}$) denotes the percentage of best-performing hyperparameters taken from each method's runs.
Figure 4: DPO $\beta$. Each point indicates a run with the corresponding $\beta$ value.
Figure 5: LN-DPO $\beta$. Each point indicates a run with the corresponding $\beta$ value.
...and 2 more figures

A Practical Analysis of Human Alignment with *PO

TL;DR

Abstract

A Practical Analysis of Human Alignment with *PO

Authors

TL;DR

Abstract

Table of Contents

Figures (7)