Table of Contents
Fetching ...

Refactor Analysis: Predictive Evaluations of Factor Models and Dimensionality

Michael Hardy

Abstract

Unidimensional factor models justify some of the most consequential summaries in science -- single scores, single ranks, and single leaderboards -- yet unidimensionality is usually assessed indirectly by fitting and evaluating models on images of the data (e.g., correlation matrices) rather than on the response matrix itself. We introduce Refactor analysis, a data-first evaluation paradigm that converts a one-factor solution into a rank-1 prediction of the original matrix by estimating both respondent- and item-side structure from dual association images. We further introduce Verifactor analysis, which evaluates the same construction under bi-cross-validated (BCV) row-column partitions for improved generalization. In simulations where the data-generating mechanism is truly rank-1 and correlational, Refactor metrics align with classical unidimensionality indices, validating the approach. However, across 200 public dichotomous datasets, traditional fit and unidimensionality measures, though highly intercorrelated, are weakly related to data recoverability, especially out of sample. This gap exposes a methodological vulnerability: excellent image-based fit can coexist with poor data-level explanatory power. Finally, treating the association measure itself as a testable hypothesis, we compare $φ$, tetrachoric, and quadrant correlation, $q^\prime$, an important reintroduction. Quadrant correlation emerges as a simple, interpretable, and remarkably robust alternative, yielding consistently stronger reconstruction and more stable behavior under sample-size variation than commonly used correlations. Together, Refactor and Verifactor shift unidimensionality assessment from "does a one-factor model fit the correlation matrix?" to the question that matters for measurement and benchmarking: does a one-factor dependence structure recover and generalize the observed responses?

Refactor Analysis: Predictive Evaluations of Factor Models and Dimensionality

Abstract

Unidimensional factor models justify some of the most consequential summaries in science -- single scores, single ranks, and single leaderboards -- yet unidimensionality is usually assessed indirectly by fitting and evaluating models on images of the data (e.g., correlation matrices) rather than on the response matrix itself. We introduce Refactor analysis, a data-first evaluation paradigm that converts a one-factor solution into a rank-1 prediction of the original matrix by estimating both respondent- and item-side structure from dual association images. We further introduce Verifactor analysis, which evaluates the same construction under bi-cross-validated (BCV) row-column partitions for improved generalization. In simulations where the data-generating mechanism is truly rank-1 and correlational, Refactor metrics align with classical unidimensionality indices, validating the approach. However, across 200 public dichotomous datasets, traditional fit and unidimensionality measures, though highly intercorrelated, are weakly related to data recoverability, especially out of sample. This gap exposes a methodological vulnerability: excellent image-based fit can coexist with poor data-level explanatory power. Finally, treating the association measure itself as a testable hypothesis, we compare , tetrachoric, and quadrant correlation, , an important reintroduction. Quadrant correlation emerges as a simple, interpretable, and remarkably robust alternative, yielding consistently stronger reconstruction and more stable behavior under sample-size variation than commonly used correlations. Together, Refactor and Verifactor shift unidimensionality assessment from "does a one-factor model fit the correlation matrix?" to the question that matters for measurement and benchmarking: does a one-factor dependence structure recover and generalize the observed responses?
Paper Structure (113 sections, 23 theorems, 34 equations, 30 figures, 4 tables)

This paper contains 113 sections, 23 theorems, 34 equations, 30 figures, 4 tables.

Key Result

Proposition 2.1

Among all monotone transformations $g\in\mathcal{G}$ applied entrywise to $\widehat{X}$, isotonic regression achieves the minimal residual sum of squares in eq:isotonic_fit. Consequently, $R^2_{\mathrm{iso}}(X,\widehat{X})$ is the largest achievable $R^2$ obtainable from $\widehat{X}$ under the sole

Figures (30)

  • Figure 1: Refactor Analyses are useful for testing assumptions of factor models. Using a large number of datasets, we can test the general application of psychometric factor models.
  • Figure 2: Example Simulation: (top) true unidimensional loadings for columns (left) and rows (right) vs their estimates, $\hat{v}$ and $\hat{u}$, respectively. (bottom) refactor rank--1 reconstruction where the data generating model reflects rank-1 tetrachoric correlations. The data and its reconstruction are compared $m(X,\hat{X})$ yielding a Refactor metric. While in this case the reconstruction has been transformed into binary for presentation, Refactor reconstructions are typically continuous: see Figure \ref{['fig:reconstruction_diagram']} for the continuous representations.
  • Figure 3: Example Response Matrix and Refactor and Verifactor Reconstructions
  • Figure 4: Refactor and Verifactor Workflows. (a) Starting from the observed response matrix $X \in \mathbb{R}^{n\times p}$, we (b) form an association matrix "image" $A$ (see Section \ref{['sec:test_corr']}) by capturing the signal of interest on both axes (see Section \ref{['sec:setup-refactor']}): (ii)$A_c$ (columns) and (iii)$A_r$ (rows). (c) Standard dimensionality reduction techniques $\textsf{Z}$ focus on this signal image derived from $\boldsymbol{X}$ (e.g., the covariance/correlation matrix $\boldsymbol{X}^T\boldsymbol{X}$) to produce column-space projection loadings, $\boldsymbol{\hat{v}}$. Refactoring extends this by performing a dual analysis on the matrix transpose to produce row-space projection loadings, $\boldsymbol{\hat{u}}$. (d) These two loading matrices are then used to reconstruct a prediction of the original data matrix, $\boldsymbol{\hat{u}}\boldsymbol{\hat{v}}^\top= \boldsymbol{\hat{X}}$ (see Section \ref{['sec:refactor_def']}. Finally, (e) Refactor Analyses evaluate the model by quantifying the correspondence between the observed data $\boldsymbol{X}$ and the refactored data $\boldsymbol{\hat{X}}$ using various (i) matrix comparison metrics, thereby assessing the model's ability to preserve the signal in the original data. Verifactor Analyses (iv) extends this paradigm to out-of-sample bi-cross-validated prediction by using limited information projections calculated from individual partitioned submatrices, $i \in \Pi$, of X to reconstruct low-rank approximations for held-out submatrices (see Section \ref{['sec:verifactor_def']}).
  • Figure 5: Proportion of Explained Common Variance (ECV) vs Proportion of Variance Explained by best monotone rank-1 Reconstruction: given the reconstruction, $\widehat{X}$ across three different conditions. (left) Simulation Study I: 1000 simulations where the underlying data generating models (DGM) are unidimensional tetrachoric correlations (see Section \ref{['sec:simple_sim']}. (middle) Simulation Study II: a hierarchical DGM with minor noise factors, following revelle_unidim_2025 (see Section \ref{['sec:unidim_reprod']}. (right) Empirical Study: 200 publicly available empirical datasets using the Item Response Warehouse (see Section \ref{['sec:empirical']}). (top) Refactor Analysis and (bottom) Verifactor out-of-sample bi-cross validated prediction. (color)represents different correlational relationships.
  • ...and 25 more figures

Theorems & Definitions (39)

  • Proposition 2.1: Optimality of isotonic calibration for monotone fit
  • proof
  • Proposition 1.1: Impossibility of Continuous, Injective Dimensionality Reduction
  • proof
  • Proposition 1.2: The Refactor Reconstruction
  • Proposition 1.3: Row/column duality for exact rank--$k$ matrices
  • proof
  • Theorem 1.4: Refactor consistency under correct low-rank structure
  • proof
  • Lemma 1.5: Self-consistency of exact rank--$k$ structure owen_bi-cross-validation_2009
  • ...and 29 more