Improving the Distributional Alignment of LLMs using Supervision
Gauri Kambhatla, Sanjana Gautam, Angela Zhang, Alex Liu, Ravi Srinivasan, Junyi Jessy Li, Matthew Lease
TL;DR
This work addresses the problem of aligning LLM-generated distributions with human opinion distributions on subjective questions by introducing supervised calibration of distributions elicited from LLMs. It demonstrates that simply prompting for sociodemographic information (SD prompting) yields inconsistent improvements, whereas a lightweight supervised calibration step consistently enhances alignment across datasets, models, and elicitation methods, while substantially reducing cross-setting variance. The study shows that calibration requires surprisingly little supervised data (often 1–10 examples) and remains effective across post-training regimes, suggesting a robust path to more pluralistic and stable alignment. By evaluating a broad benchmark across 15 models and three large-scale surveys, the paper provides actionable insights and a public benchmark to stimulate future research in distributional alignment, while acknowledging ethical considerations and limitations of demographic modeling.
Abstract
The ability to accurately align LLMs with human population groups on subjective questions would have great value. In this work, we show that use of simple supervision can greatly improve language model alignment with diverse population groups more consistently, as measured over three datasets spanning various topics. Beyond evaluating average alignment, we also report how alignment varies across specific groups. Our broad findings provide insights into the distributional alignment of LLMs with diverse population groups. By conducting evaluation over many LLMs and prompting strategies, along with open-sourcing our work, we provide a benchmark to stimulate future research.
