Table of Contents
Fetching ...

Discovering Semantic Latent Structures in Psychological Scales: A Response-Free Pathway to Efficient Simplification

Bo Wang, Yuxuan Zhang, Yueqin Hu, Hanchao Hou, Kaiping Peng, Shiguang Ni

TL;DR

This paper introduces a semantic, response-free front-end for psychological scale simplification. By encoding item text with contextual embeddings, applying dimensionality reduction, and performing density-based clustering to reveal latent semantic structure, the framework generates interpretable semantic factors and selects representative items without needing respondent data. Across DASS, IPIP, and EPOCH-CN, the semantic short forms maintained meaningful factor structures, demonstrated substantial reduction (e.g., 50+% in some cases) and preserved inter-factor relations and cross-form concordance, evidenced by CFA fit indices and alignment metrics (e.g., ARI values up to 1.00). The work provides practical guidance, robustness analyses, visualization tools, and an open-source one-click platform to facilitate adoption, positioning semantic structure as a transparent front-end that complements traditional psychometric validation.

Abstract

Psychological scale refinement traditionally relies on response-based methods such as factor analysis, item response theory, and network psychometrics to optimize item composition. Although rigorous, these approaches require large samples and may be constrained by data availability and cross-cultural comparability. Recent advances in natural language processing suggest that the semantic structure of questionnaire items may encode latent construct organization, offering a complementary response-free perspective. We introduce a topic-modeling framework that operationalizes semantic latent structure for scale simplification. Items are encoded using contextual sentence embeddings and grouped via density-based clustering to discover latent semantic factors without predefining their number. Class-based term weighting derives interpretable topic representations that approximate constructs and enable merging of semantically adjacent clusters. Representative items are selected using membership criteria within an integrated reduction pipeline. We benchmarked the framework across DASS, IPIP, and EPOCH, evaluating structural recovery, internal consistency, factor congruence, correlation preservation, and reduction efficiency. The proposed method recovered coherent factor-like groupings aligned with established constructs. Selected items reduced scale length by 60.5% on average while maintaining psychometric adequacy. Simplified scales showed high concordance with original factor structures and preserved inter-factor correlations, indicating that semantic latent organization provides a response-free approximation of measurement structure. Our framework formalizes semantic structure as an inspectable front-end for scale construction and reduction. To facilitate adoption, we provide a visualization-supported tool enabling one-click semantic analysis and structured simplification.

Discovering Semantic Latent Structures in Psychological Scales: A Response-Free Pathway to Efficient Simplification

TL;DR

This paper introduces a semantic, response-free front-end for psychological scale simplification. By encoding item text with contextual embeddings, applying dimensionality reduction, and performing density-based clustering to reveal latent semantic structure, the framework generates interpretable semantic factors and selects representative items without needing respondent data. Across DASS, IPIP, and EPOCH-CN, the semantic short forms maintained meaningful factor structures, demonstrated substantial reduction (e.g., 50+% in some cases) and preserved inter-factor relations and cross-form concordance, evidenced by CFA fit indices and alignment metrics (e.g., ARI values up to 1.00). The work provides practical guidance, robustness analyses, visualization tools, and an open-source one-click platform to facilitate adoption, positioning semantic structure as a transparent front-end that complements traditional psychometric validation.

Abstract

Psychological scale refinement traditionally relies on response-based methods such as factor analysis, item response theory, and network psychometrics to optimize item composition. Although rigorous, these approaches require large samples and may be constrained by data availability and cross-cultural comparability. Recent advances in natural language processing suggest that the semantic structure of questionnaire items may encode latent construct organization, offering a complementary response-free perspective. We introduce a topic-modeling framework that operationalizes semantic latent structure for scale simplification. Items are encoded using contextual sentence embeddings and grouped via density-based clustering to discover latent semantic factors without predefining their number. Class-based term weighting derives interpretable topic representations that approximate constructs and enable merging of semantically adjacent clusters. Representative items are selected using membership criteria within an integrated reduction pipeline. We benchmarked the framework across DASS, IPIP, and EPOCH, evaluating structural recovery, internal consistency, factor congruence, correlation preservation, and reduction efficiency. The proposed method recovered coherent factor-like groupings aligned with established constructs. Selected items reduced scale length by 60.5% on average while maintaining psychometric adequacy. Simplified scales showed high concordance with original factor structures and preserved inter-factor correlations, indicating that semantic latent organization provides a response-free approximation of measurement structure. Our framework formalizes semantic structure as an inspectable front-end for scale construction and reduction. To facilitate adoption, we provide a visualization-supported tool enabling one-click semantic analysis and structured simplification.
Paper Structure (89 sections, 5 equations, 20 figures, 16 tables)

This paper contains 89 sections, 5 equations, 20 figures, 16 tables.

Figures (20)

  • Figure 1: Overall framework
  • Figure 2: CFA graph of DASS (three-factor model)
  • Figure 3: CFA graph of DASS (one-factor model)
  • Figure 4: CFA graph of IPIP (five-factor model)
  • Figure 5: CFA graph of IPIP (one-factor model)
  • ...and 15 more figures