Table of Contents
Fetching ...

BAPO: Base-Anchored Preference Optimization for Overcoming Forgetting in Large Language Models Personalization

Gihun Lee, Minchan Jeong, Yujin Kim, Hojung Jung, Jaehoon Oh, Sangmook Kim, Se-Young Yun

TL;DR

Base-Anchored Preference Optimization (BAPO) is introduced, a simple yet effective approach that utilizes the initial responses of reference model to mitigate forgetting while accommodating personalized alignment that effectively adapts to diverse user preferences while minimally affecting global knowledge or general alignment.

Abstract

While learning to align Large Language Models (LLMs) with human preferences has shown remarkable success, aligning these models to meet the diverse user preferences presents further challenges in preserving previous knowledge. This paper examines the impact of personalized preference optimization on LLMs, revealing that the extent of knowledge loss varies significantly with preference heterogeneity. Although previous approaches have utilized the KL constraint between the reference model and the policy model, we observe that they fail to maintain general knowledge and alignment when facing personalized preferences. To this end, we introduce Base-Anchored Preference Optimization (BAPO), a simple yet effective approach that utilizes the initial responses of reference model to mitigate forgetting while accommodating personalized alignment. BAPO effectively adapts to diverse user preferences while minimally affecting global knowledge or general alignment. Our experiments demonstrate the efficacy of BAPO in various setups.

BAPO: Base-Anchored Preference Optimization for Overcoming Forgetting in Large Language Models Personalization

TL;DR

Base-Anchored Preference Optimization (BAPO) is introduced, a simple yet effective approach that utilizes the initial responses of reference model to mitigate forgetting while accommodating personalized alignment that effectively adapts to diverse user preferences while minimally affecting global knowledge or general alignment.

Abstract

While learning to align Large Language Models (LLMs) with human preferences has shown remarkable success, aligning these models to meet the diverse user preferences presents further challenges in preserving previous knowledge. This paper examines the impact of personalized preference optimization on LLMs, revealing that the extent of knowledge loss varies significantly with preference heterogeneity. Although previous approaches have utilized the KL constraint between the reference model and the policy model, we observe that they fail to maintain general knowledge and alignment when facing personalized preferences. To this end, we introduce Base-Anchored Preference Optimization (BAPO), a simple yet effective approach that utilizes the initial responses of reference model to mitigate forgetting while accommodating personalized alignment. BAPO effectively adapts to diverse user preferences while minimally affecting global knowledge or general alignment. Our experiments demonstrate the efficacy of BAPO in various setups.
Paper Structure (40 sections, 3 theorems, 10 equations, 14 figures, 4 tables)

This paper contains 40 sections, 3 theorems, 10 equations, 14 figures, 4 tables.

Key Result

Proposition 1

Given the information of ${\theta}^{}_{G}$ is known. Then, the sample complexity for estimating ${\theta}^{}_{L}$ reduces from $O(\sqrt{d})$ to $O(\sqrt{k})$.

Figures (14)

  • Figure 1: Overview of Base-Anchored Preference Optimization (BAPO): For a given user prompt, the base response achieves general alignment. Models A and Model C, fine-tuned with BAPO, maintain this alignment by anchoring to the base response. In contrast, Model B, fine-tuned with DPO, fails to preserve the knowledge from the base response, drifting away from the desired knowledge preservation area. The full example is provided in \ref{['appendix:main_full_example']}.
  • Figure 2: Performance on Global Knowledge: (a) Science QA and (b) MMLU - Humanities after personalization on diverse preferences. The black vertical dotted line indicates the base model performance.
  • Figure 3: Performance on General Alignment: (a) HHH-Honesty and (b) HHH-Harmless after personalization on diverse preferences. The black vertical dotted line indicates the base model performance.
  • Figure 4: An example of responses for a user with P1A (elementary school level) style preference. The full example is provided in \ref{['appendix:base_full_example']}
  • Figure 5: Average difference in reference model ($\pi_{\text{ref}})$ and policy model ($\pi_{\theta}$) log probabilities for Chosen, Base, and Rejected responses during personalization across four domain preferences in DSP datasets.
  • ...and 9 more figures

Theorems & Definitions (3)

  • Proposition 1
  • Lemma 1
  • Lemma 2