Robust Estimation of Polychoric Correlation
Max Welz, Patrick Mair, Andreas Alfons
TL;DR
This work addresses the vulnerability of polychoric correlation estimation to latent normality misspecification by introducing a robust C-estimator that downweights poorly fitting contingency cells via a tuning-discrepancy function. The method generalizes ML, retains full efficiency under correct specification, and remains consistent and asymptotically normal under partial misspecification, all at no additional computational cost. Through comprehensive simulations and an empirical Big Five application, the estimator demonstrates substantial robustness to careless responding and can reveal sources of contamination via Pearson residuals. Implementation in an open-source R package (robcat) facilitates practical adoption in SEMs, factor analysis, and related multivariate techniques dealing with ordinal data.
Abstract
Polychoric correlation is often an important building block in the analysis of rating data, particularly for structural equation models. However, the commonly employed maximum likelihood (ML) estimator is highly susceptible to misspecification of the polychoric correlation model, for instance through violations of latent normality assumptions. We propose a novel estimator that is designed to be robust against partial misspecification of the polychoric model, that is, when the model is misspecified for an unknown fraction of observations, such as careless respondents. To this end, the estimator minimizes a robust loss function based on the divergence between observed frequencies and theoretical frequencies implied by the polychoric model. In contrast to existing literature, our estimator makes no assumption on the type or degree of model misspecification. It furthermore generalizes ML estimation, is consistent as well as asymptotically normally distributed, and comes at no additional computational cost. We demonstrate the robustness and practical usefulness of our estimator in simulation studies and an empirical application on a Big Five administration. In the latter, the polychoric correlation estimates of our estimator and ML differ substantially, which, after further inspection, is likely due to the presence of careless respondents that the estimator helps identify.
