Refactor Analysis: Predictive Evaluations of Factor Models and Dimensionality

Michael Hardy

Refactor Analysis: Predictive Evaluations of Factor Models and Dimensionality

Michael Hardy

Abstract

Unidimensional factor models justify some of the most consequential summaries in science -- single scores, single ranks, and single leaderboards -- yet unidimensionality is usually assessed indirectly by fitting and evaluating models on images of the data (e.g., correlation matrices) rather than on the response matrix itself. We introduce Refactor analysis, a data-first evaluation paradigm that converts a one-factor solution into a rank-1 prediction of the original matrix by estimating both respondent- and item-side structure from dual association images. We further introduce Verifactor analysis, which evaluates the same construction under bi-cross-validated (BCV) row-column partitions for improved generalization. In simulations where the data-generating mechanism is truly rank-1 and correlational, Refactor metrics align with classical unidimensionality indices, validating the approach. However, across 200 public dichotomous datasets, traditional fit and unidimensionality measures, though highly intercorrelated, are weakly related to data recoverability, especially out of sample. This gap exposes a methodological vulnerability: excellent image-based fit can coexist with poor data-level explanatory power. Finally, treating the association measure itself as a testable hypothesis, we compare $φ$, tetrachoric, and quadrant correlation, $q^\prime$, an important reintroduction. Quadrant correlation emerges as a simple, interpretable, and remarkably robust alternative, yielding consistently stronger reconstruction and more stable behavior under sample-size variation than commonly used correlations. Together, Refactor and Verifactor shift unidimensionality assessment from "does a one-factor model fit the correlation matrix?" to the question that matters for measurement and benchmarking: does a one-factor dependence structure recover and generalize the observed responses?

Refactor Analysis: Predictive Evaluations of Factor Models and Dimensionality

Abstract

, tetrachoric, and quadrant correlation,

, an important reintroduction. Quadrant correlation emerges as a simple, interpretable, and remarkably robust alternative, yielding consistently stronger reconstruction and more stable behavior under sample-size variation than commonly used correlations. Together, Refactor and Verifactor shift unidimensionality assessment from "does a one-factor model fit the correlation matrix?" to the question that matters for measurement and benchmarking: does a one-factor dependence structure recover and generalize the observed responses?

Paper Structure (113 sections, 23 theorems, 34 equations, 30 figures, 4 tables)

This paper contains 113 sections, 23 theorems, 34 equations, 30 figures, 4 tables.

Introduction
Unidimensionality, a rank--1 hypothesis, is rarely tested where it lives
Image-based fit can be self-confirming
How can this create a methodological vulnerability?
The "Refactor" approach
Refactor, Verifactor, and model evaluation
Implications for unidimensionality testing
Outline
Methods
Refactor and Verifactor leverage dual random effects and recoverability metrics
Refactor Analyses
Verifactor Analyses
Verifactor prediction for two-way random-effects
Evaluation Methods
Refactor Analysis and Recoverability of Data
...and 98 more sections

Key Result

Proposition 2.1

Among all monotone transformations $g\in\mathcal{G}$ applied entrywise to $\widehat{X}$, isotonic regression achieves the minimal residual sum of squares in eq:isotonic_fit. Consequently, $R^2_{\mathrm{iso}}(X,\widehat{X})$ is the largest achievable $R^2$ obtainable from $\widehat{X}$ under the sole

Figures (30)

Figure 1: Refactor Analyses are useful for testing assumptions of factor models. Using a large number of datasets, we can test the general application of psychometric factor models.
Figure 2: Example Simulation: (top) true unidimensional loadings for columns (left) and rows (right) vs their estimates, $\hat{v}$ and $\hat{u}$, respectively. (bottom) refactor rank--1 reconstruction where the data generating model reflects rank-1 tetrachoric correlations. The data and its reconstruction are compared $m(X,\hat{X})$ yielding a Refactor metric. While in this case the reconstruction has been transformed into binary for presentation, Refactor reconstructions are typically continuous: see Figure \ref{['fig:reconstruction_diagram']} for the continuous representations.
Figure 3: Example Response Matrix and Refactor and Verifactor Reconstructions
Figure 4: Refactor and Verifactor Workflows. (a) Starting from the observed response matrix $X \in \mathbb{R}^{n\times p}$, we (b) form an association matrix "image" $A$ (see Section \ref{['sec:test_corr']}) by capturing the signal of interest on both axes (see Section \ref{['sec:setup-refactor']}): (ii)$A_c$ (columns) and (iii)$A_r$ (rows). (c) Standard dimensionality reduction techniques $\textsf{Z}$ focus on this signal image derived from $\boldsymbol{X}$ (e.g., the covariance/correlation matrix $\boldsymbol{X}^T\boldsymbol{X}$) to produce column-space projection loadings, $\boldsymbol{\hat{v}}$. Refactoring extends this by performing a dual analysis on the matrix transpose to produce row-space projection loadings, $\boldsymbol{\hat{u}}$. (d) These two loading matrices are then used to reconstruct a prediction of the original data matrix, $\boldsymbol{\hat{u}}\boldsymbol{\hat{v}}^\top= \boldsymbol{\hat{X}}$ (see Section \ref{['sec:refactor_def']}. Finally, (e) Refactor Analyses evaluate the model by quantifying the correspondence between the observed data $\boldsymbol{X}$ and the refactored data $\boldsymbol{\hat{X}}$ using various (i) matrix comparison metrics, thereby assessing the model's ability to preserve the signal in the original data. Verifactor Analyses (iv) extends this paradigm to out-of-sample bi-cross-validated prediction by using limited information projections calculated from individual partitioned submatrices, $i \in \Pi$, of X to reconstruct low-rank approximations for held-out submatrices (see Section \ref{['sec:verifactor_def']}).
Figure 5: Proportion of Explained Common Variance (ECV) vs Proportion of Variance Explained by best monotone rank-1 Reconstruction: given the reconstruction, $\widehat{X}$ across three different conditions. (left) Simulation Study I: 1000 simulations where the underlying data generating models (DGM) are unidimensional tetrachoric correlations (see Section \ref{['sec:simple_sim']}. (middle) Simulation Study II: a hierarchical DGM with minor noise factors, following revelle_unidim_2025 (see Section \ref{['sec:unidim_reprod']}. (right) Empirical Study: 200 publicly available empirical datasets using the Item Response Warehouse (see Section \ref{['sec:empirical']}). (top) Refactor Analysis and (bottom) Verifactor out-of-sample bi-cross validated prediction. (color)represents different correlational relationships.
...and 25 more figures

Theorems & Definitions (39)

Proposition 2.1: Optimality of isotonic calibration for monotone fit
proof
Proposition 1.1: Impossibility of Continuous, Injective Dimensionality Reduction
proof
Proposition 1.2: The Refactor Reconstruction
Proposition 1.3: Row/column duality for exact rank--$k$ matrices
proof
Theorem 1.4: Refactor consistency under correct low-rank structure
proof
Lemma 1.5: Self-consistency of exact rank--$k$ structure owen_bi-cross-validation_2009
...and 29 more

Refactor Analysis: Predictive Evaluations of Factor Models and Dimensionality

Abstract

Refactor Analysis: Predictive Evaluations of Factor Models and Dimensionality

Authors

Abstract

Table of Contents

Key Result

Figures (30)

Theorems & Definitions (39)