Table of Contents
Fetching ...

Disentangling Preference Representation and Text Generation for Efficient Individual Preference Alignment

Jianfei Zhang, Jun Bai, Bei Li, Yanmeng Wang, Rumei Li, Chenghua Lin, Wenge Rong

TL;DR

This work tackles the challenge of aligning large language models to individual user preferences efficiently. It introduces a dual-track approach: Contrastive Language–Latent Pretraining (CLaP), which extends decoder-only LLMs with a probabilistic latent variable $z$ via a latent encoder $q(z|x,y)$ and a latent adapter $p(y|x,z)$ to disentangle representation from generation, and Latent Direct Preference Optimization (Latent DPO), which learns a personalized latent encoder $p_{\theta}(z|x)$ using offline responses and latent rewards. By applying DPO at the latent level rather than the full model, the method achieves substantial per-user training-time reductions (80–90%) while delivering alignment quality competitive with LoRA- or P-Tuning-based PEFT baselines. Across IMDB, DailyDialog, and TL;DR summarization tasks, Latent DPO demonstrates strong personalized performance and clear efficiency gains, with additional validation on Llama3-8B showing consistent trends. This work offers a scalable solution for individual preference alignment, enabling large-scale customization without prohibitive computational cost.

Abstract

Aligning Large Language Models (LLMs) with general human preferences has been proved crucial in improving the interaction quality between LLMs and human. However, human values are inherently diverse among different individuals, making it insufficient to align LLMs solely with general preferences. To address this, personalizing LLMs according to individual feedback emerges as a promising solution. Nonetheless, this approach presents challenges in terms of the efficiency of alignment algorithms. In this work, we introduce a flexible paradigm for individual preference alignment. Our method fundamentally improves efficiency by disentangling preference representation from text generation in LLMs. We validate our approach across multiple text generation tasks and demonstrate that it can produce aligned quality as well as or better than PEFT-based methods, while reducing additional training time for each new individual preference by $80\%$ to $90\%$ in comparison with them.

Disentangling Preference Representation and Text Generation for Efficient Individual Preference Alignment

TL;DR

This work tackles the challenge of aligning large language models to individual user preferences efficiently. It introduces a dual-track approach: Contrastive Language–Latent Pretraining (CLaP), which extends decoder-only LLMs with a probabilistic latent variable via a latent encoder and a latent adapter to disentangle representation from generation, and Latent Direct Preference Optimization (Latent DPO), which learns a personalized latent encoder using offline responses and latent rewards. By applying DPO at the latent level rather than the full model, the method achieves substantial per-user training-time reductions (80–90%) while delivering alignment quality competitive with LoRA- or P-Tuning-based PEFT baselines. Across IMDB, DailyDialog, and TL;DR summarization tasks, Latent DPO demonstrates strong personalized performance and clear efficiency gains, with additional validation on Llama3-8B showing consistent trends. This work offers a scalable solution for individual preference alignment, enabling large-scale customization without prohibitive computational cost.

Abstract

Aligning Large Language Models (LLMs) with general human preferences has been proved crucial in improving the interaction quality between LLMs and human. However, human values are inherently diverse among different individuals, making it insufficient to align LLMs solely with general preferences. To address this, personalizing LLMs according to individual feedback emerges as a promising solution. Nonetheless, this approach presents challenges in terms of the efficiency of alignment algorithms. In this work, we introduce a flexible paradigm for individual preference alignment. Our method fundamentally improves efficiency by disentangling preference representation from text generation in LLMs. We validate our approach across multiple text generation tasks and demonstrate that it can produce aligned quality as well as or better than PEFT-based methods, while reducing additional training time for each new individual preference by to in comparison with them.
Paper Structure (42 sections, 11 equations, 9 figures, 18 tables, 1 algorithm)

This paper contains 42 sections, 11 equations, 9 figures, 18 tables, 1 algorithm.

Figures (9)

  • Figure 1: Our proposed method aims to offer flexible personalization learning from individual feedback, i.e., automatic individual adaptation in an efficient way.
  • Figure 2: Our method realizes efficient personalization for LLMs through three steps. Step 1 learns the posterior latent encoder (in green) and the latent adapter to disentangle representation and generation. Step 2 learns the personalized latent encoder (in yellow) from individual feedback. Step 3 steers personalized generation from LLMs in the guidance of personalized representations. Among them, only step 2 involves repetitive training for different individual users, and step 2 only involves computation in small networks, i.e., latent encoders, instead of LLMs.
  • Figure 3: Illustration of Eq. \ref{['eq_DG_ELBo']}, with condition $x$ omitted.
  • Figure 4: Illustration of Eq. \ref{['eq_contrastive']}, with condition $x$ omitted.
  • Figure 5: Additional training time on each new individual preference for different methods.
  • ...and 4 more figures