Table of Contents
Fetching ...

Can One-sided Arguments Lead to Response Change in Large Language Models?

Pedro Cisneros-Velarde

TL;DR

This work examines whether supplying one-sided arguments can steer large language models to adopt a targeted viewpoint on binary polemic questions. Using a dataset of polemic prompts across historical, political, and religious topics and evaluating multiple models under three prompt-dimension axes (response type, personal vs non-personal framing, and dialog vs block argument presentation), the study finds robust opinion steering toward the presented viewpoint. Steering is strongest when the question format aligns with the argument display and diminishes when arguments are swapped or unrelated. The results highlight a potential vulnerability in alignment safeguards and underscore the role of argument content in shaping LLM responses, with implications for safety, debiasing, and multi-agent information exchange.

Abstract

Polemic questions need more than one viewpoint to express a balanced answer. Large Language Models (LLMs) can provide a balanced answer, but also take a single aligned viewpoint or refuse to answer. In this paper, we study if such initial responses can be steered to a specific viewpoint in a simple and intuitive way: by only providing one-sided arguments supporting the viewpoint. Our systematic study has three dimensions: (i) which stance is induced in the LLM response, (ii) how the polemic question is formulated, (iii) how the arguments are shown. We construct a small dataset and remarkably find that opinion steering occurs across (i)-(iii) for diverse models, number of arguments, and topics. Switching to other arguments consistently decreases opinion steering.

Can One-sided Arguments Lead to Response Change in Large Language Models?

TL;DR

This work examines whether supplying one-sided arguments can steer large language models to adopt a targeted viewpoint on binary polemic questions. Using a dataset of polemic prompts across historical, political, and religious topics and evaluating multiple models under three prompt-dimension axes (response type, personal vs non-personal framing, and dialog vs block argument presentation), the study finds robust opinion steering toward the presented viewpoint. Steering is strongest when the question format aligns with the argument display and diminishes when arguments are swapped or unrelated. The results highlight a potential vulnerability in alignment safeguards and underscore the role of argument content in shaping LLM responses, with implications for safety, debiasing, and multi-agent information exchange.

Abstract

Polemic questions need more than one viewpoint to express a balanced answer. Large Language Models (LLMs) can provide a balanced answer, but also take a single aligned viewpoint or refuse to answer. In this paper, we study if such initial responses can be steered to a specific viewpoint in a simple and intuitive way: by only providing one-sided arguments supporting the viewpoint. Our systematic study has three dimensions: (i) which stance is induced in the LLM response, (ii) how the polemic question is formulated, (iii) how the arguments are shown. We construct a small dataset and remarkably find that opinion steering occurs across (i)-(iii) for diverse models, number of arguments, and topics. Switching to other arguments consistently decreases opinion steering.
Paper Structure (13 sections, 17 tables)