Table of Contents
Fetching ...

MindShift: Analyzing Language Models' Reactions to Psychological Prompts

Anton Vasiliuk, Irina Abdullaeva, Polina Druzhinina, Anton Razzhigaev, Andrey Kuznetsov

TL;DR

MindShift advances a psychometrics-grounded framework to probe how large language models reveal or adopt personality cues. By adapting MMPI-2 scales to LLM outputs and pairing them with structured psychological biases in prompts, the authors establish a benchmark to measure susceptibility, bias sensitivity, and role perception across model families. Key findings include robust validity signals (directional shifts and internal consistency), model-family clustering influenced by architecture and fine-tuning, and meaningful correlations between psychometric traits and safety/accuracy metrics such as TruthfulQA. The work also demonstrates practical pathways for evaluating and aligning psychologically sensitive AI systems, and it provides public tooling to facilitate further research in this area.

Abstract

Large language models (LLMs) hold the potential to absorb and reflect personality traits and attitudes specified by users. In our study, we investigated this potential using robust psychometric measures. We adapted the most studied test in psychological literature, namely Minnesota Multiphasic Personality Inventory (MMPI) and examined LLMs' behavior to identify traits. To asses the sensitivity of LLMs' prompts and psychological biases we created personality-oriented prompts, crafting a detailed set of personas that vary in trait intensity. This enables us to measure how well LLMs follow these roles. Our study introduces MindShift, a benchmark for evaluating LLMs' psychological adaptability. The results highlight a consistent improvement in LLMs' role perception, attributed to advancements in training datasets and alignment techniques. Additionally, we observe significant differences in responses to psychometric assessments across different model types and families, suggesting variability in their ability to emulate human-like personality traits. MindShift prompts and code for LLM evaluation will be publicly available.

MindShift: Analyzing Language Models' Reactions to Psychological Prompts

TL;DR

MindShift advances a psychometrics-grounded framework to probe how large language models reveal or adopt personality cues. By adapting MMPI-2 scales to LLM outputs and pairing them with structured psychological biases in prompts, the authors establish a benchmark to measure susceptibility, bias sensitivity, and role perception across model families. Key findings include robust validity signals (directional shifts and internal consistency), model-family clustering influenced by architecture and fine-tuning, and meaningful correlations between psychometric traits and safety/accuracy metrics such as TruthfulQA. The work also demonstrates practical pathways for evaluating and aligning psychologically sensitive AI systems, and it provides public tooling to facilitate further research in this area.

Abstract

Large language models (LLMs) hold the potential to absorb and reflect personality traits and attitudes specified by users. In our study, we investigated this potential using robust psychometric measures. We adapted the most studied test in psychological literature, namely Minnesota Multiphasic Personality Inventory (MMPI) and examined LLMs' behavior to identify traits. To asses the sensitivity of LLMs' prompts and psychological biases we created personality-oriented prompts, crafting a detailed set of personas that vary in trait intensity. This enables us to measure how well LLMs follow these roles. Our study introduces MindShift, a benchmark for evaluating LLMs' psychological adaptability. The results highlight a consistent improvement in LLMs' role perception, attributed to advancements in training datasets and alignment techniques. Additionally, we observe significant differences in responses to psychometric assessments across different model types and families, suggesting variability in their ability to emulate human-like personality traits. MindShift prompts and code for LLM evaluation will be publicly available.

Paper Structure

This paper contains 19 sections, 3 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Full prompt components: prefix prompt, person description and Test statement. A person description consists of a Persona General Descriptor and a Psychological Bias Descriptor. Supplemental Table 3 details the Psychological Bias Descriptor used in the experiments.
  • Figure 2: Violin plot showing the distribution of Cronbach’s alpha values across all MMPI scales for each model, plotted against VRIN values. Each violin represents one model.
  • Figure 3: Psychological biases perception across base and instruction models.
  • Figure 4: Correlation between MMPI scales and predicted answer length on the left plot and response inconsistency on the right plot.
  • Figure 5: t-SNE visualization of MMPI-2 scores across different language models. Each point represents a personality profile assigned to a model, with color coding differentiating model families. Instruction-tuned models are highlighted with markers.
  • ...and 3 more figures