Table of Contents
Fetching ...

Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback

Songyang Gao, Qiming Ge, Wei Shen, Shihan Dou, Junjie Ye, Xiao Wang, Rui Zheng, Yicheng Zou, Zhi Chen, Hang Yan, Qi Zhang, Dahua Lin

TL;DR

Linear Alignment addresses the resource-intensive RLHF pipeline by proposing a closed-form, tuning-free method to align LLMs with human preferences in one inference step. It builds a local linear approximation to the policy optimization objective under a divergence constraint and uses Self-Contrastive Decoding to estimate the optimization direction without data labeling or training. The method achieves competitive performance compared to PPO-based RLHF on general preferences and demonstrates strong gains on personalized preferences, with modest increases in inference cost. The work contributes a principled, parameter-free alignment paradigm and an open-source dataset to benchmark personalization in LLMs.

Abstract

The success of AI assistants based on Language Models (LLMs) hinges on Reinforcement Learning from Human Feedback (RLHF) to comprehend and align with user intentions. However, traditional alignment algorithms, such as PPO, are hampered by complex annotation and training requirements. This reliance limits the applicability of RLHF and hinders the development of professional assistants tailored to diverse human preferences. In this work, we introduce \textit{Linear Alignment}, a novel algorithm that aligns language models with human preferences in one single inference step, eliminating the reliance on data annotation and model training. Linear alignment incorporates a new parameterization for policy optimization under divergence constraints, which enables the extraction of optimal policy in a closed-form manner and facilitates the direct estimation of the aligned response. Extensive experiments on both general and personalized preference datasets demonstrate that linear alignment significantly enhances the performance and efficiency of LLM alignment across diverse scenarios. Our code and dataset is published on \url{https://github.com/Wizardcoast/Linear_Alignment.git}.

Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback

TL;DR

Linear Alignment addresses the resource-intensive RLHF pipeline by proposing a closed-form, tuning-free method to align LLMs with human preferences in one inference step. It builds a local linear approximation to the policy optimization objective under a divergence constraint and uses Self-Contrastive Decoding to estimate the optimization direction without data labeling or training. The method achieves competitive performance compared to PPO-based RLHF on general preferences and demonstrates strong gains on personalized preferences, with modest increases in inference cost. The work contributes a principled, parameter-free alignment paradigm and an open-source dataset to benchmark personalization in LLMs.

Abstract

The success of AI assistants based on Language Models (LLMs) hinges on Reinforcement Learning from Human Feedback (RLHF) to comprehend and align with user intentions. However, traditional alignment algorithms, such as PPO, are hampered by complex annotation and training requirements. This reliance limits the applicability of RLHF and hinders the development of professional assistants tailored to diverse human preferences. In this work, we introduce \textit{Linear Alignment}, a novel algorithm that aligns language models with human preferences in one single inference step, eliminating the reliance on data annotation and model training. Linear alignment incorporates a new parameterization for policy optimization under divergence constraints, which enables the extraction of optimal policy in a closed-form manner and facilitates the direct estimation of the aligned response. Extensive experiments on both general and personalized preference datasets demonstrate that linear alignment significantly enhances the performance and efficiency of LLM alignment across diverse scenarios. Our code and dataset is published on \url{https://github.com/Wizardcoast/Linear_Alignment.git}.
Paper Structure (51 sections, 20 equations, 9 figures, 6 tables, 1 algorithm)

This paper contains 51 sections, 20 equations, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: Illustration of our Linear Alignment framework with a toy example. The principle prompt (top) has a limited impact on the model policy, resulting in similar responses. However, these small policy differences reflect the gradient of potential Q-function concerning the output logits. We then take one-step optimization towards larger distribution divergence, which constructs a linear approximation for policy optimization.
  • Figure 2: Preference evaluation results by GPT-4, we compare Linear Alignment with five baseline methods on Vicuna (left) and Mistral-instruct (right). To eliminate the position bias, we evaluated each pair of generated results twice by exchanging their order at the test.
  • Figure 3: Normalized reward distribution of models optimized using PPO and linear alignment with the original SFT model on the test data. We trained a reward model on the HH-RLHF dataset and used it to score the test data sampled from the same distribution.
  • Figure 4: The performance of different alignment models on personal preference datasets across various domains.
  • Figure 5: The impact of different ratios for linear alignment on personal preference dataset with Mistral-7B-Instruct
  • ...and 4 more figures