Group Preference Alignment: Customized LLM Response Generation from In-Situ Conversations

Ishani Mondal; Jack W. Stokes; Sujay Kumar Jauhar; Longqi Yang; Mengting Wan; Xiaofeng Xu; Xia Song; Jennifer Neville

Group Preference Alignment: Customized LLM Response Generation from In-Situ Conversations

Ishani Mondal, Jack W. Stokes, Sujay Kumar Jauhar, Longqi Yang, Mengting Wan, Xiaofeng Xu, Xia Song, Jennifer Neville

TL;DR

Group Preference Alignment (GPA) presents a two-stage framework to personalize LLM responses for distinct user groups using in-situ conversation logs. It first extracts context-specific preferences into interpretable rubrics via Group-Aware Preference Extraction, and then tailors generation through Context-Tuned Inference (GPA-CT) or Rubric-Finetuning Inference (GPA-FT). Experiments on Microsoft Copilot and WildChat show improved alignment with group preferences and user satisfaction without sacrificing standard benchmark performance. The work highlights the value of intent-aware rubrics for robust, data-efficient customization and broad applicability across domains.

Abstract

LLMs often fail to meet the specialized needs of distinct user groups due to their one-size-fits-all training paradigm \cite{lucy-etal-2024-one} and there is limited research on what personalization aspects each group expect. To address these limitations, we propose a group-aware personalization framework, Group Preference Alignment (GPA), that identifies context-specific variations in conversational preferences across user groups and then steers LLMs to address those preferences. Our approach consists of two steps: (1) Group-Aware Preference Extraction, where maximally divergent user-group preferences are extracted from real-world conversation logs and distilled into interpretable rubrics, and (2) Tailored Response Generation, which leverages these rubrics through two methods: a) Context-Tuned Inference (GAP-CT), that dynamically adjusts responses via context-dependent prompt instructions, and b) Rubric-Finetuning Inference (GPA-FT), which uses the rubrics to generate contrastive synthetic data for personalization of group-specific models via alignment. Experiments demonstrate that our framework significantly improves alignment of the output with respect to user preferences and outperforms baseline methods, while maintaining robust performance on standard benchmarks.

Group Preference Alignment: Customized LLM Response Generation from In-Situ Conversations

TL;DR

Abstract

Group Preference Alignment: Customized LLM Response Generation from In-Situ Conversations

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)