Table of Contents
Fetching ...

Reasoning Boosts Opinion Alignment in LLMs

Frédéric Berdoz, Yann Billeter, Yann Vonlanthen, Roger Wattenhofer

TL;DR

Results indicate that reasoning enhances opinion modeling and is competitive with strong baselines, but does not fully remove bias, highlighting the need for additional mechanisms to build faithful political digital twins using LLMs.

Abstract

Opinion modeling aims to capture individual or group political preferences, enabling applications such as digital democracies, where models could help shape fairer and more popular policies. Given their versatility, strong generalization capabilities, and demonstrated success across diverse text-to-text applications, large language models (LLMs) are natural candidates for this task. However, due to their statistical nature and limited causal understanding, they tend to produce biased opinions when prompted naively. In this work, we study whether reasoning can improve opinion alignment. Motivated by the recent advancement in mathematical reasoning enabled by reinforcement learning (RL), we train models to produce profile-consistent answers through structured reasoning. We evaluate our approach on three datasets covering U.S., European, and Swiss politics. Results indicate that reasoning enhances opinion modeling and is competitive with strong baselines, but does not fully remove bias, highlighting the need for additional mechanisms to build faithful political digital twins using LLMs. By releasing both our method and datasets, we establish a solid baseline to support future research on LLM opinion alignment.

Reasoning Boosts Opinion Alignment in LLMs

TL;DR

Results indicate that reasoning enhances opinion modeling and is competitive with strong baselines, but does not fully remove bias, highlighting the need for additional mechanisms to build faithful political digital twins using LLMs.

Abstract

Opinion modeling aims to capture individual or group political preferences, enabling applications such as digital democracies, where models could help shape fairer and more popular policies. Given their versatility, strong generalization capabilities, and demonstrated success across diverse text-to-text applications, large language models (LLMs) are natural candidates for this task. However, due to their statistical nature and limited causal understanding, they tend to produce biased opinions when prompted naively. In this work, we study whether reasoning can improve opinion alignment. Motivated by the recent advancement in mathematical reasoning enabled by reinforcement learning (RL), we train models to produce profile-consistent answers through structured reasoning. We evaluate our approach on three datasets covering U.S., European, and Swiss politics. Results indicate that reasoning enhances opinion modeling and is competitive with strong baselines, but does not fully remove bias, highlighting the need for additional mechanisms to build faithful political digital twins using LLMs. By releasing both our method and datasets, we establish a solid baseline to support future research on LLM opinion alignment.
Paper Structure (54 sections, 1 equation, 12 figures, 15 tables)

This paper contains 54 sections, 1 equation, 12 figures, 15 tables.

Figures (12)

  • Figure 1: GRPO for opinion alignment. We use public opinion surveys to align LLMs with individual preference profiles. First, the LLM is fine-tuned with (synthetic) statements and ground-truth answers to adhere to the reasoning template. After fine-tuning, the model answers in the correct format, but is not fully aligned with opinions. We use GRPO with a reward model that rewards proper formatting and correct answers to further improve reasoning.
  • Figure 2: Agents are more centrist and conservative. First two dimensions of the principal component analysis of all candidates (small dots) standing for election in the 2023 Swiss national elections (smartvote). The x- and y-axes correspond to the left-right and conservative-liberal spectra, respectively. The shifts between the positions of the candidates (big dots) included in the smartvote dataset and their agents (gold) are depicted by black lines. Unlike results in the literature which indicate a left-libertarian bias exler_large_2025hartmann_political_2023rozado_political_2024, we observe that our agents are shifted towards the (center-)right. The average distortions between ground-truth and agent position for each group (big arrows) show an overall trend towards more conservative (negative y) and no clear left-right bias. Implementation details are given in \ref{['appendix:pca-details']}.
  • Figure 3: Not all political positions are equally learnable. F1 scores reveal that while SFT+GRPO typically works best, every training method underperforms on center and right-leaning groups. Error bars show variance within groups. These disparities may stem from biases baked into Llama 3.1 8B or from inherent differences in how well various political preferences can be learned from survey data. Either way, the results demonstrate that ideology impacts the learnability of political preferences. A detailed figure with the original parties and ideology groups is presented in \ref{['appendix:additional-figures']}.
  • Figure 4: Learning neutral stances is hard. On ANES, there is a strong correlation between predictive performance and the neutral base rate on the test set. Individuals in the Right group, and to some extent in the Center group, tend to answer questions with Neutral more often. Both in terms of F1 (top left) and accuracy (bottom left), we observe a negative correlation significant at the $5\%$-level. Regression details in \ref{['appendix:regression-tables']}. Top right: Recall by class on ANES. Averaged over all individuals, the model struggles the most with predicting the Neutral class. Bottom right: Performance in terms of F1 score on ANES before (solid) and after (shaded) removing Neutral instances. All groups improve, and the gap between Left and Center becomes narrow. However, the difference between Left and Right remains. While the large number of Neutrals likely depressed performance on the other two classes during training, this could also suggest that the performance disparity is affected by other factors, such as model bias or artifacts from recoding ANES. All results were obtained from Llama 3.1 8B trained with SFT+GRPO.
  • Figure 5: Biased SFT data consistently impairs the performance of underrepresented groups. F1-scores for Llama trained with SFT+GRPO on differently biased datasets. Left: Data with a progressive bias strongly impairs the Right group, without consistently benefiting the Left group. Middle: Performance on the default dataset. Right: Data with a conservative bias decreases the performance of the Left group, while showing no consistent improvement for the Right group. These asymmetric effects suggest that ideological bias in SFT data primarily harms the underrepresented perspective rather than systematically improving performance on the overrepresented one.
  • ...and 7 more figures