Table of Contents
Fetching ...

PrimeX: A Dataset of Worldview, Opinion, and Explanation

Rik Koncel-Kedziorski, Brihi Joshi, Tim Paek

TL;DR

PrimeX introduces a novel dataset that jointly captures opinions, free-form explanations, and Primal World Beliefs from 858 US residents to study worldview-informed personalization of language models. By linking cross-topic opinions with worldview signals and explanatory text, the work demonstrates correlations between Primals and opinions, and shows that including explanations and worldview can improve opinion prediction for PA-LMs, with GPT-4o benefiting more consistently than smaller models in All Topics settings. The authors also quantify the predictive value of explanations via a utility measure and show that high-utility explanations tend to be longer and more aligned with test questions, while a trained model can predict user Primals from their opinion-data. Overall, PrimeX provides a rich, multi-faceted resource to advance both NLP and psychology research on personalized AI, enabling worldview-aware simulations and deeper analyses of how beliefs shape behavior. The dataset highlights potential practical impact for more human-centered, less demographic-stereotype-driven personalization, while acknowledging limitations in scope and reproducibility.

Abstract

As the adoption of language models advances, so does the need to better represent individual users to the model. Are there aspects of an individual's belief system that a language model can utilize for improved alignment? Following prior research, we investigate this question in the domain of opinion prediction by developing PrimeX, a dataset of public opinion survey data from 858 US residents with two additional sources of belief information: written explanations from the respondents for why they hold specific opinions, and the Primal World Belief survey for assessing respondent worldview. We provide an extensive initial analysis of our data and show the value of belief explanations and worldview for personalizing language models. Our results demonstrate how the additional belief information in PrimeX can benefit both the NLP and psychological research communities, opening up avenues for further study.

PrimeX: A Dataset of Worldview, Opinion, and Explanation

TL;DR

PrimeX introduces a novel dataset that jointly captures opinions, free-form explanations, and Primal World Beliefs from 858 US residents to study worldview-informed personalization of language models. By linking cross-topic opinions with worldview signals and explanatory text, the work demonstrates correlations between Primals and opinions, and shows that including explanations and worldview can improve opinion prediction for PA-LMs, with GPT-4o benefiting more consistently than smaller models in All Topics settings. The authors also quantify the predictive value of explanations via a utility measure and show that high-utility explanations tend to be longer and more aligned with test questions, while a trained model can predict user Primals from their opinion-data. Overall, PrimeX provides a rich, multi-faceted resource to advance both NLP and psychology research on personalized AI, enabling worldview-aware simulations and deeper analyses of how beliefs shape behavior. The dataset highlights potential practical impact for more human-centered, less demographic-stereotype-driven personalization, while acknowledging limitations in scope and reproducibility.

Abstract

As the adoption of language models advances, so does the need to better represent individual users to the model. Are there aspects of an individual's belief system that a language model can utilize for improved alignment? Following prior research, we investigate this question in the domain of opinion prediction by developing PrimeX, a dataset of public opinion survey data from 858 US residents with two additional sources of belief information: written explanations from the respondents for why they hold specific opinions, and the Primal World Belief survey for assessing respondent worldview. We provide an extensive initial analysis of our data and show the value of belief explanations and worldview for personalizing language models. Our results demonstrate how the additional belief information in PrimeX can benefit both the NLP and psychological research communities, opening up avenues for further study.

Paper Structure

This paper contains 39 sections, 3 equations, 9 figures, 16 tables.

Figures (9)

  • Figure 1: Overview of the PrimeX data. We collect three types of responses from a diverse pool of participants: Opinions from 3 Pew Research surveys; explanations for 3 opinions per survey; and Primal World Belief survey of participant worldview.
  • Figure 2: Examples of opinions with user explanations.
  • Figure 3: Distribution of Utility Scores for Explanations in PrimeX.
  • Figure 4: Utility Scores by Question. This figure is best viewed in color.
  • Figure 6: Examples of opinions with user explanations.
  • ...and 4 more figures