Table of Contents
Fetching ...

Neural network embeddings recover value dimensions from psychometric survey items on par with human data

Max Pellert, Clemens M. Lechner, Indira Sen, Markus Strohmaier

TL;DR

This work shows that neural embeddings, when processed with Survey and Questionnaire Item Embeddings Differentials (SQuID), can recover the latent value dimensions of the PVQ-RR with accuracy on par with human judgments, and do so without domain-specific fine-tuning. By evaluating multiple embedding models against human rater data using Cronbach's alpha, dimension-dimension correlations, and multidimensional scaling with Procrustes alignment, the authors demonstrate substantial concordance (R^2 ≈ 0.55) and a coherent circumplex structure. The approach generalizes to additional inventories (IPIP, BFI-2, HEXACO), delivering large increases in inter-item correlation ranges and showcasing broad applicability. Overall, SQuID offers a scalable, flexible, and cost-effective complement to traditional psychometric workflows, enabling broader, cross-cultural, and multilingual psychometrics with neural embeddings.

Abstract

We demonstrate that embeddings derived from large language models, when processed with "Survey and Questionnaire Item Embeddings Differentials" (SQuID), can recover the structure of human values obtained from human rater judgments on the Revised Portrait Value Questionnaire (PVQ-RR). We compare multiple embedding models across a number of evaluation metrics including internal consistency, dimension correlations and multidimensional scaling configurations. Unlike previous approaches, SQuID addresses the challenge of obtaining negative correlations between dimensions without requiring domain-specific fine-tuning or training data re-annotation. Quantitative analysis reveals that our embedding-based approach explains 55% of variance in dimension-dimension similarities compared to human data. Multidimensional scaling configurations show alignment with pooled human data from 49 different countries. Generalizability tests across three personality inventories (IPIP, BFI-2, HEXACO) demonstrate that SQuID consistently increases correlation ranges, suggesting applicability beyond value theory. These results show that semantic embeddings can effectively replicate psychometric structures previously established through extensive human surveys. The approach offers substantial advantages in cost, scalability and flexibility while maintaining comparable quality to traditional methods. Our findings have significant implications for psychometrics and social science research, providing a complementary methodology that could expand the scope of human behavior and experience represented in measurement tools.

Neural network embeddings recover value dimensions from psychometric survey items on par with human data

TL;DR

This work shows that neural embeddings, when processed with Survey and Questionnaire Item Embeddings Differentials (SQuID), can recover the latent value dimensions of the PVQ-RR with accuracy on par with human judgments, and do so without domain-specific fine-tuning. By evaluating multiple embedding models against human rater data using Cronbach's alpha, dimension-dimension correlations, and multidimensional scaling with Procrustes alignment, the authors demonstrate substantial concordance (R^2 ≈ 0.55) and a coherent circumplex structure. The approach generalizes to additional inventories (IPIP, BFI-2, HEXACO), delivering large increases in inter-item correlation ranges and showcasing broad applicability. Overall, SQuID offers a scalable, flexible, and cost-effective complement to traditional psychometric workflows, enabling broader, cross-cultural, and multilingual psychometrics with neural embeddings.

Abstract

We demonstrate that embeddings derived from large language models, when processed with "Survey and Questionnaire Item Embeddings Differentials" (SQuID), can recover the structure of human values obtained from human rater judgments on the Revised Portrait Value Questionnaire (PVQ-RR). We compare multiple embedding models across a number of evaluation metrics including internal consistency, dimension correlations and multidimensional scaling configurations. Unlike previous approaches, SQuID addresses the challenge of obtaining negative correlations between dimensions without requiring domain-specific fine-tuning or training data re-annotation. Quantitative analysis reveals that our embedding-based approach explains 55% of variance in dimension-dimension similarities compared to human data. Multidimensional scaling configurations show alignment with pooled human data from 49 different countries. Generalizability tests across three personality inventories (IPIP, BFI-2, HEXACO) demonstrate that SQuID consistently increases correlation ranges, suggesting applicability beyond value theory. These results show that semantic embeddings can effectively replicate psychometric structures previously established through extensive human surveys. The approach offers substantial advantages in cost, scalability and flexibility while maintaining comparable quality to traditional methods. Our findings have significant implications for psychometrics and social science research, providing a complementary methodology that could expand the scope of human behavior and experience represented in measurement tools.

Paper Structure

This paper contains 26 sections, 3 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Relationship between item similarity (Pearson correlation) derived from embeddings (SQuID treated) and human data. We explain more than half of the variance with a simple linear model.
  • Figure 2: Unrotated MDS configuration derived from SQuID treated embeddings. As value dimensions are placed according to their distances to each other, a circular structure emerges. For fully spelled out dimension names see Figure \ref{['fig:scalesubtraction']}.
  • Figure 3: Comparison of MDS configurations from embeddings and from human data. The configurations are shown after Procrustes similarity transformation. Human data in red, embeddings in blue. We find few long and few strongly overlapping lines evidencing general similarity of the two configurations derived from two distinct types of data. For fully spelled out dimension names see Figure \ref{['fig:scalesubtraction']}.
  • Figure 4: Schematic overview of our workflow. We use LLMs to create embeddings, subtract an average embedding of all items from each item embedding and aggregate by dimension through averaging. For both the embeddings as well as the human data, we compute Cronbach's alpha and create similarity matrices to compare both types of data. We run multidimensional scaling, rotate the configurations to maximum similarity and compare the results both visually as well as quantitatively.
  • Figure 5: Replacing MDS with PCA. We plot the first two dimensions found through PCA with the embeddings. We see similar clustering as compared to the MDS configuration with less overall circular structure.
  • ...and 3 more figures