Table of Contents
Fetching ...

Aligning LLMs with Individual Preferences via Interaction

Shujin Wu, May Fung, Cheng Qian, Jeonghwan Kim, Dilek Hakkani-Tur, Heng Ji

TL;DR

The paper tackles the challenge of aligning LLMs to individual user preferences by enabling models to infer implicit user traits through multi-turn interactions. It introduces a scalable pipeline that builds a diverse persona pool and a tree-structured preference dataset via multi-LLM collaboration, and then trains models with supervised fine-tuning and reinforcement learning using Direct Preference Optimization. An ALOE benchmark with 100 validated personas assesses dynamic personalization, showing substantial improvements over baselines across several open-source LLMs. The work demonstrates that interaction-driven alignment can significantly enhance personalized conversational experiences, while acknowledging practical limits such as a 10-turn cap and suggesting future work to extend dialogue length and diversify interactions.

Abstract

As large language models (LLMs) demonstrate increasingly advanced capabilities, aligning their behaviors with human values and preferences becomes crucial for their wide adoption. While previous research focuses on general alignment to principles such as helpfulness, harmlessness, and honesty, the need to account for individual and diverse preferences has been largely overlooked, potentially undermining customized human experiences. To address this gap, we train LLMs that can ''interact to align'', essentially cultivating the meta-skill of LLMs to implicitly infer the unspoken personalized preferences of the current user through multi-turn conversations, and then dynamically align their following behaviors and responses to these inferred preferences. Our approach involves establishing a diverse pool of 3,310 distinct user personas by initially creating seed examples, which are then expanded through iterative self-generation and filtering. Guided by distinct user personas, we leverage multi-LLM collaboration to develop a multi-turn preference dataset containing 3K+ multi-turn conversations in tree structures. Finally, we apply supervised fine-tuning and reinforcement learning to enhance LLMs using this dataset. For evaluation, we establish the ALOE (ALign With CustOmized PrEferences) benchmark, consisting of 100 carefully selected examples and well-designed metrics to measure the customized alignment performance during conversations. Experimental results demonstrate the effectiveness of our method in enabling dynamic, personalized alignment via interaction.

Aligning LLMs with Individual Preferences via Interaction

TL;DR

The paper tackles the challenge of aligning LLMs to individual user preferences by enabling models to infer implicit user traits through multi-turn interactions. It introduces a scalable pipeline that builds a diverse persona pool and a tree-structured preference dataset via multi-LLM collaboration, and then trains models with supervised fine-tuning and reinforcement learning using Direct Preference Optimization. An ALOE benchmark with 100 validated personas assesses dynamic personalization, showing substantial improvements over baselines across several open-source LLMs. The work demonstrates that interaction-driven alignment can significantly enhance personalized conversational experiences, while acknowledging practical limits such as a 10-turn cap and suggesting future work to extend dialogue length and diversify interactions.

Abstract

As large language models (LLMs) demonstrate increasingly advanced capabilities, aligning their behaviors with human values and preferences becomes crucial for their wide adoption. While previous research focuses on general alignment to principles such as helpfulness, harmlessness, and honesty, the need to account for individual and diverse preferences has been largely overlooked, potentially undermining customized human experiences. To address this gap, we train LLMs that can ''interact to align'', essentially cultivating the meta-skill of LLMs to implicitly infer the unspoken personalized preferences of the current user through multi-turn conversations, and then dynamically align their following behaviors and responses to these inferred preferences. Our approach involves establishing a diverse pool of 3,310 distinct user personas by initially creating seed examples, which are then expanded through iterative self-generation and filtering. Guided by distinct user personas, we leverage multi-LLM collaboration to develop a multi-turn preference dataset containing 3K+ multi-turn conversations in tree structures. Finally, we apply supervised fine-tuning and reinforcement learning to enhance LLMs using this dataset. For evaluation, we establish the ALOE (ALign With CustOmized PrEferences) benchmark, consisting of 100 carefully selected examples and well-designed metrics to measure the customized alignment performance during conversations. Experimental results demonstrate the effectiveness of our method in enabling dynamic, personalized alignment via interaction.
Paper Structure (35 sections, 4 equations, 4 figures, 2 tables)

This paper contains 35 sections, 4 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Using our approach, LLMs can implicitly infer user profiles and personalities, allowing them to progressively tailor responses to align with individual preferences.
  • Figure 2: Iterative self-generation and semantic similarity based filtering for establishing the persona pool.
  • Figure 3: While previous work uses sampling to generate multiple responses and recruit human annotators to rank them based on general pre-defined principles ouyang2022training, we use diverse personas to guide the conversation and implement multi-LLM collaboration to generate the preference dataset. Instead of single-turn pairwise responses, our approach can construct tree-structured multi-turn conversations.
  • Figure 4: Visualized performance of four base LLMs and their fine-tuned variants across ten conversation rounds. Note that all four plots share the same x and y-axis ranges.