PrimeX: A Dataset of Worldview, Opinion, and Explanation
Rik Koncel-Kedziorski, Brihi Joshi, Tim Paek
TL;DR
PrimeX introduces a novel dataset that jointly captures opinions, free-form explanations, and Primal World Beliefs from 858 US residents to study worldview-informed personalization of language models. By linking cross-topic opinions with worldview signals and explanatory text, the work demonstrates correlations between Primals and opinions, and shows that including explanations and worldview can improve opinion prediction for PA-LMs, with GPT-4o benefiting more consistently than smaller models in All Topics settings. The authors also quantify the predictive value of explanations via a utility measure and show that high-utility explanations tend to be longer and more aligned with test questions, while a trained model can predict user Primals from their opinion-data. Overall, PrimeX provides a rich, multi-faceted resource to advance both NLP and psychology research on personalized AI, enabling worldview-aware simulations and deeper analyses of how beliefs shape behavior. The dataset highlights potential practical impact for more human-centered, less demographic-stereotype-driven personalization, while acknowledging limitations in scope and reproducibility.
Abstract
As the adoption of language models advances, so does the need to better represent individual users to the model. Are there aspects of an individual's belief system that a language model can utilize for improved alignment? Following prior research, we investigate this question in the domain of opinion prediction by developing PrimeX, a dataset of public opinion survey data from 858 US residents with two additional sources of belief information: written explanations from the respondents for why they hold specific opinions, and the Primal World Belief survey for assessing respondent worldview. We provide an extensive initial analysis of our data and show the value of belief explanations and worldview for personalizing language models. Our results demonstrate how the additional belief information in PrimeX can benefit both the NLP and psychological research communities, opening up avenues for further study.
