Table of Contents
Fetching ...

Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset

Lily Hong Zhang, Smitha Milli, Karen Jusko, Jonathan Smith, Brandon Amos, Wassim Bouaziz, Manon Revel, Jack Kussman, Yasha Sheynin, Lisa Titus, Bhaktipriya Radharapu, Jane Yu, Vidya Sarma, Kris Rose, Maximilian Nickel

TL;DR

This work addresses the challenge of aligning large language models to diverse global user preferences by showing that humans exhibit substantial variation along core value axes (Inglehart-Welzel dimensions) that is not captured by current model outputs. It identifies an algorithmic monoculture in responses from 21 LLMs and demonstrates that existing preference-collection methods produce largely homogeneous candidate sets, hindering learning of heterogeneous preferences. The authors propose negatively-correlated sampling (NC) as a simple, effective prompting strategy to generate more diverse candidate responses, which significantly improves downstream learning across IW values for standard alignment methods. Building on this, they collect Community Alignment (CA), the largest open-source, multilingual, multi-turn preference dataset to date (~200,000 comparisons from 3,196 annotators across five countries), featuring NC sampling, non-English data, free-form explanations, and prompt-level annotator overlap. CA is designed to enable new analyses and methods for pluralistic alignment, with implications for improving LLM usefulness across a globally diverse population. 3–5 sentences summarizing the problem, approach, key contributions, and practical impact: NC sampling reveals gaps in current alignment pipelines, CA provides a rich resource for developing pluralistic alignment techniques, and the broader impact lies in enabling LLMs to better serve diverse users while highlighting the need for diverse data collection practices in AI systems.

Abstract

How can large language models (LLMs) serve users with varying preferences that may conflict across cultural, political, or other dimensions? To advance this challenge, this paper establishes four key results. First, we demonstrate, through a large-scale multilingual human study with representative samples from five countries (N=15,000), that humans exhibit significantly more variation in preferences than the responses of 21 state-of-the-art LLMs. Second, we show that existing methods for preference dataset collection are insufficient for learning the diversity of human preferences even along two of the most salient dimensions of variability in global values, due to the underlying homogeneity of candidate responses. Third, we argue that this motivates the need for negatively-correlated sampling when generating candidate sets, and we show that simple prompt-based techniques for doing so significantly enhance the performance of alignment methods in learning heterogeneous preferences. Fourth, based on this novel candidate sampling approach, we collect and open-source Community Alignment, the largest and most representative multilingual and multi-turn preference dataset to date, featuring almost 200,000 comparisons from annotators spanning five countries. We hope that the Community Alignment dataset will be a valuable resource for improving the effectiveness of LLMs for a diverse global population.

Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset

TL;DR

This work addresses the challenge of aligning large language models to diverse global user preferences by showing that humans exhibit substantial variation along core value axes (Inglehart-Welzel dimensions) that is not captured by current model outputs. It identifies an algorithmic monoculture in responses from 21 LLMs and demonstrates that existing preference-collection methods produce largely homogeneous candidate sets, hindering learning of heterogeneous preferences. The authors propose negatively-correlated sampling (NC) as a simple, effective prompting strategy to generate more diverse candidate responses, which significantly improves downstream learning across IW values for standard alignment methods. Building on this, they collect Community Alignment (CA), the largest open-source, multilingual, multi-turn preference dataset to date (~200,000 comparisons from 3,196 annotators across five countries), featuring NC sampling, non-English data, free-form explanations, and prompt-level annotator overlap. CA is designed to enable new analyses and methods for pluralistic alignment, with implications for improving LLM usefulness across a globally diverse population. 3–5 sentences summarizing the problem, approach, key contributions, and practical impact: NC sampling reveals gaps in current alignment pipelines, CA provides a rich resource for developing pluralistic alignment techniques, and the broader impact lies in enabling LLMs to better serve diverse users while highlighting the need for diverse data collection practices in AI systems.

Abstract

How can large language models (LLMs) serve users with varying preferences that may conflict across cultural, political, or other dimensions? To advance this challenge, this paper establishes four key results. First, we demonstrate, through a large-scale multilingual human study with representative samples from five countries (N=15,000), that humans exhibit significantly more variation in preferences than the responses of 21 state-of-the-art LLMs. Second, we show that existing methods for preference dataset collection are insufficient for learning the diversity of human preferences even along two of the most salient dimensions of variability in global values, due to the underlying homogeneity of candidate responses. Third, we argue that this motivates the need for negatively-correlated sampling when generating candidate sets, and we show that simple prompt-based techniques for doing so significantly enhance the performance of alignment methods in learning heterogeneous preferences. Fourth, based on this novel candidate sampling approach, we collect and open-source Community Alignment, the largest and most representative multilingual and multi-turn preference dataset to date, featuring almost 200,000 comparisons from annotators spanning five countries. We hope that the Community Alignment dataset will be a valuable resource for improving the effectiveness of LLMs for a diverse global population.

Paper Structure

This paper contains 37 sections, 10 figures, 11 tables.

Figures (10)

  • Figure 1: Human pluralism vs algorithmic monoculture.. Individuals show substantial heterogeneity in the values they prefer in LLM responses, even within the U.S. (left). However, all $21$ state-of-the-art language models systematically output responses towards secular-rational and self-expression values (right). See \ref{['fig:phase1-all-countries']} for results in France, Italy, India, and Brazil.
  • Figure 2: Temperature sampling has limited coverage of Inglehart-Welzel values (blue), but NC sampling yields Pareto improvements (orange). For the set of everday prompts curated in \ref{['sec:phase1']}, each plot captures the proportion of times that a given sampling method yields at least one example aligning with a certain value within a set of four candidate responses. State-of-the-art chat models achieve converage of traditional and survival values in only 20–40% of cases, meaning that 60–80% of the time, there is no representation of such values in an option set of four responses. In contrast, NC sampling yields Pareto improvements in the coverage of all four values. See \ref{['app:qual_candidate_sets']} for qualitative examples of the candidate sets generated by temperature sampling and NC sampling.
  • Figure :
  • Figure B.1: An overview of our joint human survey and model evaluation. We conduct a nationally representative human survey where participants choose their preferred response from a set of responses that varies along one of the two Inglehart-Welzel value dimensions. We also evaluate default LLM over the same prompts and score the generations against the same balanced response showed to human participants. We perform this joint human survey and model evaluation over five countries and languages, with a representative sample of individual participants (N=15000) and 21 source and commercial language models.
  • Figure B.2: Results for all countries and languages in the joint human study and model evaluation described in \ref{['sec:humans-vs-llms']}. While individual preferences within each country show high heterogeneity, LLMs in all languages produce responses that are predominantly aligned with secular-rational and self-expression-oriented values, except in Hindi where some models switch to producing responses that express survival-oriented and traditional responses.
  • ...and 5 more figures