Table of Contents
Fetching ...

Improving the Distributional Alignment of LLMs using Supervision

Gauri Kambhatla, Sanjana Gautam, Angela Zhang, Alex Liu, Ravi Srinivasan, Junyi Jessy Li, Matthew Lease

TL;DR

This work addresses the problem of aligning LLM-generated distributions with human opinion distributions on subjective questions by introducing supervised calibration of distributions elicited from LLMs. It demonstrates that simply prompting for sociodemographic information (SD prompting) yields inconsistent improvements, whereas a lightweight supervised calibration step consistently enhances alignment across datasets, models, and elicitation methods, while substantially reducing cross-setting variance. The study shows that calibration requires surprisingly little supervised data (often 1–10 examples) and remains effective across post-training regimes, suggesting a robust path to more pluralistic and stable alignment. By evaluating a broad benchmark across 15 models and three large-scale surveys, the paper provides actionable insights and a public benchmark to stimulate future research in distributional alignment, while acknowledging ethical considerations and limitations of demographic modeling.

Abstract

The ability to accurately align LLMs with human population groups on subjective questions would have great value. In this work, we show that use of simple supervision can greatly improve language model alignment with diverse population groups more consistently, as measured over three datasets spanning various topics. Beyond evaluating average alignment, we also report how alignment varies across specific groups. Our broad findings provide insights into the distributional alignment of LLMs with diverse population groups. By conducting evaluation over many LLMs and prompting strategies, along with open-sourcing our work, we provide a benchmark to stimulate future research.

Improving the Distributional Alignment of LLMs using Supervision

TL;DR

This work addresses the problem of aligning LLM-generated distributions with human opinion distributions on subjective questions by introducing supervised calibration of distributions elicited from LLMs. It demonstrates that simply prompting for sociodemographic information (SD prompting) yields inconsistent improvements, whereas a lightweight supervised calibration step consistently enhances alignment across datasets, models, and elicitation methods, while substantially reducing cross-setting variance. The study shows that calibration requires surprisingly little supervised data (often 1–10 examples) and remains effective across post-training regimes, suggesting a robust path to more pluralistic and stable alignment. By evaluating a broad benchmark across 15 models and three large-scale surveys, the paper provides actionable insights and a public benchmark to stimulate future research in distributional alignment, while acknowledging ethical considerations and limitations of demographic modeling.

Abstract

The ability to accurately align LLMs with human population groups on subjective questions would have great value. In this work, we show that use of simple supervision can greatly improve language model alignment with diverse population groups more consistently, as measured over three datasets spanning various topics. Beyond evaluating average alignment, we also report how alignment varies across specific groups. Our broad findings provide insights into the distributional alignment of LLMs with diverse population groups. By conducting evaluation over many LLMs and prompting strategies, along with open-sourcing our work, we provide a benchmark to stimulate future research.

Paper Structure

This paper contains 39 sections, 1 equation, 4 figures, 17 tables.

Figures (4)

  • Figure 1: Prior work studies using persona/sociodemographic prompting to align LLMs with humans for subjective questions. In this work, we elicit distributions from LLMs and calibrate them to better align with human response distributions.
  • Figure 2: Standard deviation vs. opinion alignment. Each point represents the average alignment for each dataset, LLM, and elicitation method. For visual clarity, we omit 43/290 uncalibrated points having opinion alignment below 60. Calibration tends to both increase opinion alignment and decrease standard deviation. It also decreases variance between settings.
  • Figure 3: Mean Squared Error (MSE) of regression models on various training data sizes, using SD prompted and verbally elicited distributions. Plots are shown for each dataset: (a) WGM, (b) OQA, and (c) WVS. Although model and dataset dependent, MSE converges between 1 and 10 examples.
  • Figure 4: Opinion alignment for OLMo-2-7B models with different post-training methods using the verbalized distribution elicitation method. Calibration results in more consistent alignment across all post-training methods, both with and without SD prompting. Without calibration, alignment is much more dataset and post-training method dependent.