Table of Contents
Fetching ...

Group Preference Alignment: Customized LLM Response Generation from In-Situ Conversations

Ishani Mondal, Jack W. Stokes, Sujay Kumar Jauhar, Longqi Yang, Mengting Wan, Xiaofeng Xu, Xia Song, Jennifer Neville

TL;DR

Group Preference Alignment (GPA) presents a two-stage framework to personalize LLM responses for distinct user groups using in-situ conversation logs. It first extracts context-specific preferences into interpretable rubrics via Group-Aware Preference Extraction, and then tailors generation through Context-Tuned Inference (GPA-CT) or Rubric-Finetuning Inference (GPA-FT). Experiments on Microsoft Copilot and WildChat show improved alignment with group preferences and user satisfaction without sacrificing standard benchmark performance. The work highlights the value of intent-aware rubrics for robust, data-efficient customization and broad applicability across domains.

Abstract

LLMs often fail to meet the specialized needs of distinct user groups due to their one-size-fits-all training paradigm \cite{lucy-etal-2024-one} and there is limited research on what personalization aspects each group expect. To address these limitations, we propose a group-aware personalization framework, Group Preference Alignment (GPA), that identifies context-specific variations in conversational preferences across user groups and then steers LLMs to address those preferences. Our approach consists of two steps: (1) Group-Aware Preference Extraction, where maximally divergent user-group preferences are extracted from real-world conversation logs and distilled into interpretable rubrics, and (2) Tailored Response Generation, which leverages these rubrics through two methods: a) Context-Tuned Inference (GAP-CT), that dynamically adjusts responses via context-dependent prompt instructions, and b) Rubric-Finetuning Inference (GPA-FT), which uses the rubrics to generate contrastive synthetic data for personalization of group-specific models via alignment. Experiments demonstrate that our framework significantly improves alignment of the output with respect to user preferences and outperforms baseline methods, while maintaining robust performance on standard benchmarks.

Group Preference Alignment: Customized LLM Response Generation from In-Situ Conversations

TL;DR

Group Preference Alignment (GPA) presents a two-stage framework to personalize LLM responses for distinct user groups using in-situ conversation logs. It first extracts context-specific preferences into interpretable rubrics via Group-Aware Preference Extraction, and then tailors generation through Context-Tuned Inference (GPA-CT) or Rubric-Finetuning Inference (GPA-FT). Experiments on Microsoft Copilot and WildChat show improved alignment with group preferences and user satisfaction without sacrificing standard benchmark performance. The work highlights the value of intent-aware rubrics for robust, data-efficient customization and broad applicability across domains.

Abstract

LLMs often fail to meet the specialized needs of distinct user groups due to their one-size-fits-all training paradigm \cite{lucy-etal-2024-one} and there is limited research on what personalization aspects each group expect. To address these limitations, we propose a group-aware personalization framework, Group Preference Alignment (GPA), that identifies context-specific variations in conversational preferences across user groups and then steers LLMs to address those preferences. Our approach consists of two steps: (1) Group-Aware Preference Extraction, where maximally divergent user-group preferences are extracted from real-world conversation logs and distilled into interpretable rubrics, and (2) Tailored Response Generation, which leverages these rubrics through two methods: a) Context-Tuned Inference (GAP-CT), that dynamically adjusts responses via context-dependent prompt instructions, and b) Rubric-Finetuning Inference (GPA-FT), which uses the rubrics to generate contrastive synthetic data for personalization of group-specific models via alignment. Experiments demonstrate that our framework significantly improves alignment of the output with respect to user preferences and outperforms baseline methods, while maintaining robust performance on standard benchmarks.

Paper Structure

This paper contains 30 sections, 2 equations, 6 figures, 20 tables.

Figures (6)

  • Figure 1: Illustration of GPA rubric extraction (Sec. \ref{['subsec:extractpref']}) showing group aware preference extraction across two groups ( Expert v. Novice) with conversations about Docker and .env file integration. First individual preferences are extracted from conversations, then the extracted preferences are grouped into minibatches, contrasted to extract salient differences, and summarized into intent-specific rubrics.
  • Figure 2: Illustration of GPA tailored response generation (Sec. \ref{['subsec:Alignment']}) for a Novice user with a Code debugging intent using GPA-CT and GPA-FT.
  • Figure 3: Rubrics/Aspects on which Experts and Novices Differ in the Education and Programming Domains as extracted from Microsoft Copilot logs on three tasks (Information Requests, Program Inquiry, Information Seeking) with the Likert-Scale Rating (1-5) on the x-axis and aspect/rubric names on the y-axis.
  • Figure 4: Bar plot evaluating Gemma outputs from intent-aware rubric creation vs intent-unaware rubric creation using $\texttt{GPA-CT}\xspace$. Results shows that intent heavily impacts performance when GPA approach is used to personalize responses on Microsoft Copilot test set.
  • Figure 5: Illustrates how random shuffling of expertise labels impacts rubric generation. It reveals that, for most intents, shuffling results in the extraction of predominantly invalid rubric items, ultimately reducing the overall quality and number of valid rubric extractions.
  • ...and 1 more figures