Table of Contents
Fetching ...

Large Language Models as Superpositions of Cultural Perspectives

Grgur Kovač, Masataka Sawayama, Rémy Portelas, Cédric Colas, Peter Ford Dominey, Pierre-Yves Oudeyer

TL;DR

This paper contests the view that LLMs have stable personalities by showing that they exhibit large, context-dependent shifts in expressed values and traits, which the authors frame as a superposition of perspectives. They develop a formal framework, combining three psychology questionnaires (PVQ, VSM, IPIP) with controlled perspective inductions and a new metric, perspective controllability, to quantify how effectively a given prompt induces a target perspective across models. Across 16 models and four induction methods, they find robust, context-driven perspective shifts and varying levels of controllability, with RLHF-tuned models generally showing higher controllability. The work highlights important implications for interpreting AI behavior, designing benchmarks, and aligning AI systems with culturally diverse value systems, while proposing new avenues for measuring and controlling perspective in LLMs.

Abstract

Large Language Models (LLMs) are often misleadingly recognized as having a personality or a set of values. We argue that an LLM can be seen as a superposition of perspectives with different values and personality traits. LLMs exhibit context-dependent values and personality traits that change based on the induced perspective (as opposed to humans, who tend to have more coherent values and personality traits across contexts). We introduce the concept of perspective controllability, which refers to a model's affordance to adopt various perspectives with differing values and personality traits. In our experiments, we use questionnaires from psychology (PVQ, VSM, IPIP) to study how exhibited values and personality traits change based on different perspectives. Through qualitative experiments, we show that LLMs express different values when those are (implicitly or explicitly) implied in the prompt, and that LLMs express different values even when those are not obviously implied (demonstrating their context-dependent nature). We then conduct quantitative experiments to study the controllability of different models (GPT-4, GPT-3.5, OpenAssistant, StableVicuna, StableLM), the effectiveness of various methods for inducing perspectives, and the smoothness of the models' drivability. We conclude by examining the broader implications of our work and outline a variety of associated scientific questions. The project website is available at https://sites.google.com/view/llm-superpositions .

Large Language Models as Superpositions of Cultural Perspectives

TL;DR

This paper contests the view that LLMs have stable personalities by showing that they exhibit large, context-dependent shifts in expressed values and traits, which the authors frame as a superposition of perspectives. They develop a formal framework, combining three psychology questionnaires (PVQ, VSM, IPIP) with controlled perspective inductions and a new metric, perspective controllability, to quantify how effectively a given prompt induces a target perspective across models. Across 16 models and four induction methods, they find robust, context-driven perspective shifts and varying levels of controllability, with RLHF-tuned models generally showing higher controllability. The work highlights important implications for interpreting AI behavior, designing benchmarks, and aligning AI systems with culturally diverse value systems, while proposing new avenues for measuring and controlling perspective in LLMs.

Abstract

Large Language Models (LLMs) are often misleadingly recognized as having a personality or a set of values. We argue that an LLM can be seen as a superposition of perspectives with different values and personality traits. LLMs exhibit context-dependent values and personality traits that change based on the induced perspective (as opposed to humans, who tend to have more coherent values and personality traits across contexts). We introduce the concept of perspective controllability, which refers to a model's affordance to adopt various perspectives with differing values and personality traits. In our experiments, we use questionnaires from psychology (PVQ, VSM, IPIP) to study how exhibited values and personality traits change based on different perspectives. Through qualitative experiments, we show that LLMs express different values when those are (implicitly or explicitly) implied in the prompt, and that LLMs express different values even when those are not obviously implied (demonstrating their context-dependent nature). We then conduct quantitative experiments to study the controllability of different models (GPT-4, GPT-3.5, OpenAssistant, StableVicuna, StableLM), the effectiveness of various methods for inducing perspectives, and the smoothness of the models' drivability. We conclude by examining the broader implications of our work and outline a variety of associated scientific questions. The project website is available at https://sites.google.com/view/llm-superpositions .
Paper Structure (28 sections, 3 equations, 15 figures, 9 tables)

This paper contains 28 sections, 3 equations, 15 figures, 9 tables.

Figures (15)

  • Figure 1: Inducing a perspective for the PVQ questionnaire. We aim to induce the target personal values of self-enhancement (power and achievement) using a 2nd person perspective transmitted via the system prompt of language models. We then compute the answer of the model conditioned on that perspective for a question from the PVQ questionnaire. This process is repeated independently for all questions of the questionnaire and 50 different permutations of the answers order.
  • Figure 2: Estimating perspective controllability. We put the model in four perspectives, each with different target values (expressed explicitly in the prompt). We query the model with a questionnaire in each perspective. We then score the answers to get the scores for all the values in all the perspectives. For each perspective, we compute the distance between target and other values' scores, and average those estimates to compute the final controllability estimate.
  • Figure 3: Evidence for the unexpected perspective shift effect. The effect of different simulated conversations on: (a) basic personal values, and (b) cultural values. The effect of different textual formats on: (c) basic personal values, and (d) cultural values. The effect of Wikipedia paragraphs about different music genres: (e) basic personal values, and (f) cultural values. Although these contexts seem orthogonal to the tested values, we found them to cause significant effects on all personal values expressed by ChatGPT except those denoted by a gray background (ANOVA tests). Varying the context (e.g. from Python code questions to C++ code questions, or from jazz music context to gospel context) sometimes leads to large shifts in expressed values (e.g. achievement and stimulation respectively).
  • Figure 4: Fictoinal characters Values exhibited by GPT-3.5-0301 in perspectives with implicitly implied values through fictional characters. We can see that GPT can express different values as expected.
  • Figure 5: Music experts GPT-3.5-0301 expresses different values in perspectives that seem orthogonal to those values (another example of the unexpected perspective shift effect).
  • ...and 10 more figures