From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers

Yi-Fei Liu; Yi-Long Lu; Di He; Hang Zhang

From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers

Yi-Fei Liu, Yi-Long Lu, Di He, Hang Zhang

TL;DR

The paper investigates whether large language models can precisely infer the inter-construct network of human psychological traits from minimal data, addressing whether such performance reflects genuine reasoning or pattern matching. Using a zero-shot paradigm that maps 20 Big Five items to nine other scales, the authors show that LLM-derived inter-scale correlations align with human data with $R^2 > 0.89$, and that predictions exhibit a structural amplification characterized by a slope $k > 1$. They reveal a two-stage reasoning mechanism: information selection that highlights high-level factors, followed by information compression into abstract natural language summaries that robustly predict target scales; these summaries can even outperform raw scores when combined. Across models and conditions, the amplification correlates strongly with predictive accuracy, and rigorous validation demonstrates significance and robustness. The work offers a mechanistic interpretation of AI reasoning in psychology, suggesting LLMs can serve as precise psychological simulators and shedding light on emergent abstract reasoning in large models.

Abstract

Psychological constructs within individuals are widely believed to be interconnected. We investigated whether and how Large Language Models (LLMs) can model the correlational structure of human psychological traits from minimal quantitative inputs. We prompted various LLMs with Big Five Personality Scale responses from 816 human individuals to role-play their responses on nine other psychological scales. LLMs demonstrated remarkable accuracy in capturing human psychological structure, with the inter-scale correlation patterns from LLM-generated responses strongly aligning with those from human data $(R^2 > 0.89)$. This zero-shot performance substantially exceeded predictions based on semantic similarity and approached the accuracy of machine learning algorithms trained directly on the dataset. Analysis of reasoning traces revealed that LLMs use a systematic two-stage process: First, they transform raw Big Five responses into natural language personality summaries through information selection and compression, analogous to generating sufficient statistics. Second, they generate target scale responses based on reasoning from these summaries. For information selection, LLMs identify the same key personality factors as trained algorithms, though they fail to differentiate item importance within factors. The resulting compressed summaries are not merely redundant representations but capture synergistic information--adding them to original scores enhances prediction alignment, suggesting they encode emergent, second-order patterns of trait interplay. Our findings demonstrate that LLMs can precisely predict individual participants' psychological traits from minimal data through a process of abstraction and reasoning, offering both a powerful tool for psychological simulation and valuable insights into their emergent reasoning capabilities.

From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers

TL;DR

, and that predictions exhibit a structural amplification characterized by a slope

. They reveal a two-stage reasoning mechanism: information selection that highlights high-level factors, followed by information compression into abstract natural language summaries that robustly predict target scales; these summaries can even outperform raw scores when combined. Across models and conditions, the amplification correlates strongly with predictive accuracy, and rigorous validation demonstrates significance and robustness. The work offers a mechanistic interpretation of AI reasoning in psychology, suggesting LLMs can serve as precise psychological simulators and shedding light on emergent abstract reasoning in large models.

Abstract

. This zero-shot performance substantially exceeded predictions based on semantic similarity and approached the accuracy of machine learning algorithms trained directly on the dataset. Analysis of reasoning traces revealed that LLMs use a systematic two-stage process: First, they transform raw Big Five responses into natural language personality summaries through information selection and compression, analogous to generating sufficient statistics. Second, they generate target scale responses based on reasoning from these summaries. For information selection, LLMs identify the same key personality factors as trained algorithms, though they fail to differentiate item importance within factors. The resulting compressed summaries are not merely redundant representations but capture synergistic information--adding them to original scores enhances prediction alignment, suggesting they encode emergent, second-order patterns of trait interplay. Our findings demonstrate that LLMs can precisely predict individual participants' psychological traits from minimal data through a process of abstraction and reasoning, offering both a powerful tool for psychological simulation and valuable insights into their emergent reasoning capabilities.

From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers

TL;DR

Abstract

From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)