Table of Contents
Fetching ...

From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers

Yi-Fei Liu, Yi-Long Lu, Di He, Hang Zhang

TL;DR

The paper investigates whether large language models can precisely infer the inter-construct network of human psychological traits from minimal data, addressing whether such performance reflects genuine reasoning or pattern matching. Using a zero-shot paradigm that maps 20 Big Five items to nine other scales, the authors show that LLM-derived inter-scale correlations align with human data with $R^2 > 0.89$, and that predictions exhibit a structural amplification characterized by a slope $k > 1$. They reveal a two-stage reasoning mechanism: information selection that highlights high-level factors, followed by information compression into abstract natural language summaries that robustly predict target scales; these summaries can even outperform raw scores when combined. Across models and conditions, the amplification correlates strongly with predictive accuracy, and rigorous validation demonstrates significance and robustness. The work offers a mechanistic interpretation of AI reasoning in psychology, suggesting LLMs can serve as precise psychological simulators and shedding light on emergent abstract reasoning in large models.

Abstract

Psychological constructs within individuals are widely believed to be interconnected. We investigated whether and how Large Language Models (LLMs) can model the correlational structure of human psychological traits from minimal quantitative inputs. We prompted various LLMs with Big Five Personality Scale responses from 816 human individuals to role-play their responses on nine other psychological scales. LLMs demonstrated remarkable accuracy in capturing human psychological structure, with the inter-scale correlation patterns from LLM-generated responses strongly aligning with those from human data $(R^2 > 0.89)$. This zero-shot performance substantially exceeded predictions based on semantic similarity and approached the accuracy of machine learning algorithms trained directly on the dataset. Analysis of reasoning traces revealed that LLMs use a systematic two-stage process: First, they transform raw Big Five responses into natural language personality summaries through information selection and compression, analogous to generating sufficient statistics. Second, they generate target scale responses based on reasoning from these summaries. For information selection, LLMs identify the same key personality factors as trained algorithms, though they fail to differentiate item importance within factors. The resulting compressed summaries are not merely redundant representations but capture synergistic information--adding them to original scores enhances prediction alignment, suggesting they encode emergent, second-order patterns of trait interplay. Our findings demonstrate that LLMs can precisely predict individual participants' psychological traits from minimal data through a process of abstraction and reasoning, offering both a powerful tool for psychological simulation and valuable insights into their emergent reasoning capabilities.

From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers

TL;DR

The paper investigates whether large language models can precisely infer the inter-construct network of human psychological traits from minimal data, addressing whether such performance reflects genuine reasoning or pattern matching. Using a zero-shot paradigm that maps 20 Big Five items to nine other scales, the authors show that LLM-derived inter-scale correlations align with human data with , and that predictions exhibit a structural amplification characterized by a slope . They reveal a two-stage reasoning mechanism: information selection that highlights high-level factors, followed by information compression into abstract natural language summaries that robustly predict target scales; these summaries can even outperform raw scores when combined. Across models and conditions, the amplification correlates strongly with predictive accuracy, and rigorous validation demonstrates significance and robustness. The work offers a mechanistic interpretation of AI reasoning in psychology, suggesting LLMs can serve as precise psychological simulators and shedding light on emergent abstract reasoning in large models.

Abstract

Psychological constructs within individuals are widely believed to be interconnected. We investigated whether and how Large Language Models (LLMs) can model the correlational structure of human psychological traits from minimal quantitative inputs. We prompted various LLMs with Big Five Personality Scale responses from 816 human individuals to role-play their responses on nine other psychological scales. LLMs demonstrated remarkable accuracy in capturing human psychological structure, with the inter-scale correlation patterns from LLM-generated responses strongly aligning with those from human data . This zero-shot performance substantially exceeded predictions based on semantic similarity and approached the accuracy of machine learning algorithms trained directly on the dataset. Analysis of reasoning traces revealed that LLMs use a systematic two-stage process: First, they transform raw Big Five responses into natural language personality summaries through information selection and compression, analogous to generating sufficient statistics. Second, they generate target scale responses based on reasoning from these summaries. For information selection, LLMs identify the same key personality factors as trained algorithms, though they fail to differentiate item importance within factors. The resulting compressed summaries are not merely redundant representations but capture synergistic information--adding them to original scores enhances prediction alignment, suggesting they encode emergent, second-order patterns of trait interplay. Our findings demonstrate that LLMs can precisely predict individual participants' psychological traits from minimal data through a process of abstraction and reasoning, offering both a powerful tool for psychological simulation and valuable insights into their emergent reasoning capabilities.

Paper Structure

This paper contains 42 sections, 13 figures, 3 tables.

Figures (13)

  • Figure 1: Procedural flowchart for Experiment 1. In Phase 1, the LLM was conditioned on each individual's Big Five scores and tasked with role-playing to predict the individual's scores on nine other psychological scales. In Phase 2, we performed dataset-level structural analysis by first computing correlation matrices between all psychological scale components for both LLM predictions and human ground truth data and then comparing the resulting correlation matrices, which reveals a linear amplification in LLMs' reconstructed psychological structures.
  • Figure 2: LLMs' linear amplification of the correlational structure of human psychological traits. The top-left panel is a heatmap comparing the correlations from human data with those predicted by Gemini 2.5 (paired rows), separately for each of the Big Five personality factors (rows) and the target psychological scales or their sub-scales (columns). The top-right panel plots Gemini 2.5's correlations against human data correlations, revealing a strong linear relationship ($R^2 = 0.92$) with an amplification slope ($k = 1.42$). Each dot denotes a pair of correlations in top-left panel (color codes different target scales, see Appendix \ref{['sec:appendix_scales']} for a complete list). The bottom-left panel shows that all tested LLMs exhibit an amplification coefficient $k > 1.0$, consistently outperforming retrieval (KNN) and semantic similarity models. The bottom-right panel shows a near-perfect linear relationship ($R^2 = 0.95$) between a model's amplification coefficient ($k$) and its predictive performance. Each dot denotes one model.
  • Figure 3: Flowchart of the "Reasoning-to-Annotation" analyses in Experiment 2. Each reasoning trace comes from Experiment 1, where "Reasoning Models" use the 20 input scores to generate predictions along with reasoning traces. These traces are processed in two parallel analyses: an "Annotation Model" parses each trace to create a structured attribution vector (addressing our first research question), while the summary within that same trace is used to predict the outcome, testing its predictive potency (addressing our second research question).
  • Figure 4: Analysis of the LLM information selection mechanism, demonstrating a concept-driven strategy. The plots compare LLM attribution distributions (solid lines) to a Bayesian Ridge Regression baseline (dashed blue line). The top row shows item-level attributions, revealing noisy alignment. The bottom row shows factor-level attributions, demonstrating near-perfect alignment. This indicates models correctly identify high-level factors (e.g., Neuroticism) despite their confusion about specific item weights within factors.
  • Figure 5: Analysis of the efficacy and synergy of LLM-generated summaries. The left panel shows the amplification effect persists even when using only the abstract summary ($R^2=0.91$). The right panel compares the amplification multiplier for all models across the three information conditions, demonstrating the synergistic value of adding the summary.
  • ...and 8 more figures