Table of Contents
Fetching ...

Aligning Language Models to User Opinions

EunJeong Hwang, Bodhisattwa Prasad Majumder, Niket Tandon

TL;DR

This work targets aligning LLMs to individual users rather than broad demographic groups by leveraging demographics, ideology, and especially past opinions. Using the OpinionQA dataset and GPT-3 prompts, it demonstrates that including past opinions alongside demographic and ideological signals yields up to about 7 percentage points in QA accuracy, with top-k past opinions often sufficing. The study also analyzes group-level baselines and finds that memory-based personalization can outperform group-only prompts, while raising ethical concerns about echo chambers and proposing mitigations. Overall, the paper presents a memory-informed, multi-signal approach to personalized LLM alignment and outlines extensions for continuous opinion memory and safeguards against biased amplification.

Abstract

An important aspect of developing LLMs that interact with humans is to align models' behavior to their users. It is possible to prompt an LLM into behaving as a certain persona, especially a user group or ideological persona the model captured during its pertaining stage. But, how to best align an LLM with a specific user and not a demographic or ideological group remains an open question. Mining public opinion surveys (by Pew Research), we find that the opinions of a user and their demographics and ideologies are not mutual predictors. We use this insight to align LLMs by modeling both user opinions as well as user demographics and ideology, achieving up to 7 points accuracy gains in predicting public opinions from survey questions across a broad set of topics. In addition to the typical approach of prompting LLMs with demographics and ideology, we discover that utilizing the most relevant past opinions from individual users enables the model to predict user opinions more accurately.

Aligning Language Models to User Opinions

TL;DR

This work targets aligning LLMs to individual users rather than broad demographic groups by leveraging demographics, ideology, and especially past opinions. Using the OpinionQA dataset and GPT-3 prompts, it demonstrates that including past opinions alongside demographic and ideological signals yields up to about 7 percentage points in QA accuracy, with top-k past opinions often sufficing. The study also analyzes group-level baselines and finds that memory-based personalization can outperform group-only prompts, while raising ethical concerns about echo chambers and proposing mitigations. Overall, the paper presents a memory-informed, multi-signal approach to personalized LLM alignment and outlines extensions for continuous opinion memory and safeguards against biased amplification.

Abstract

An important aspect of developing LLMs that interact with humans is to align models' behavior to their users. It is possible to prompt an LLM into behaving as a certain persona, especially a user group or ideological persona the model captured during its pertaining stage. But, how to best align an LLM with a specific user and not a demographic or ideological group remains an open question. Mining public opinion surveys (by Pew Research), we find that the opinions of a user and their demographics and ideologies are not mutual predictors. We use this insight to align LLMs by modeling both user opinions as well as user demographics and ideology, achieving up to 7 points accuracy gains in predicting public opinions from survey questions across a broad set of topics. In addition to the typical approach of prompting LLMs with demographics and ideology, we discover that utilizing the most relevant past opinions from individual users enables the model to predict user opinions more accurately.
Paper Structure (33 sections, 8 figures, 6 tables)

This paper contains 33 sections, 8 figures, 6 tables.

Figures (8)

  • Figure 1: An illustrative example that shows opinions can vary even when two individuals have the exact same demographic traits.
  • Figure 2: Topic-wise agreement score; x-axis: agreement score, y-axis: topic. This graph shows that users with similar demographics/ ideology can have different opinions (cohen kappa scores of around 0.4 show not some but not substantial correlation in opinions)
  • Figure 3: Prompt using demographics, ideology, and GPT embeddings based top-$k$ past opinions to predict the answer to a question.
  • Figure 4: An example of a not relevant opinion confusing the model.
  • Figure 5: prompt for implicit-only model
  • ...and 3 more figures