Table of Contents
Fetching ...

Maximum softly penalised likelihood in factor analysis

Philipp Sterzinger, Ioannis Kosmids, Irini Moustaki

TL;DR

This paper tackles the prevalence of Heywood cases in exploratory factor analysis by introducing a maximum softly penalised likelihood (MSPL) framework. It derives general conditions under which penalised estimators exist and preserve key ML properties, and shows that penalties from Akaike (1987) and Hirose et al. (2011) satisfy these conditions when scaled appropriately to yield softly decaying penalties. The authors establish consistency and, under stronger identification, $\sqrt{n}$-consistency and asymptotic normality for MSPL estimators, demonstrating that soft penalties can avoid improper boundary solutions without compromising ML-type inference. Through extensive simulations and real-data applications, MSPL improves finite-sample performance, stabilises factor loading estimates and communality estimates, and yields more reliable model selection compared with naive penalisation or unpenalised ML. The framework provides a principled approach to mitigate Heywood cases while maintaining the desirable properties of ML estimation in factor analysis.

Abstract

Estimation in exploratory factor analysis often yields estimates on the boundary of the parameter space. Such occurrences, known as Heywood cases, are characterised by non-positive variance estimates and can cause issues in numerical optimisation procedures or convergence failures, which, in turn, can lead to misleading inferences, particularly regarding factor scores and model selection. We derive sufficient conditions on the model and a penalty to the log-likelihood function that i) guarantee the existence of maximum penalised likelihood estimates in the interior of the parameter space, and ii) ensure that the corresponding estimators possess the desirable asymptotic properties expected by the maximum likelihood estimator, namely consistency and asymptotic normality. Consistency and asymptotic normality are achieved when the penalisation is soft enough, in a way that adapts to the information accumulation about the model parameters. We formally show, for the first time, that the penalties of Akaike (1987) and Hirose et al. (2011) to the log-likelihood of the normal linear factor model satisfy the conditions for existence, and, hence, deal with Heywood cases. Their vanilla versions, though, can result in questionable finite-sample properties in estimation, inference, and model selection. The maximum softly-penalised likelihood framework we introduce enables the careful scaling of those penalties to ensure that the resulting estimation and inference procedures inherit the ML estimator's optimal properties. Through comprehensive simulation studies and the analysis of real data sets, we illustrate the desirable finite-sample properties of the maximum softly penalised likelihood estimators and associated procedures.

Maximum softly penalised likelihood in factor analysis

TL;DR

This paper tackles the prevalence of Heywood cases in exploratory factor analysis by introducing a maximum softly penalised likelihood (MSPL) framework. It derives general conditions under which penalised estimators exist and preserve key ML properties, and shows that penalties from Akaike (1987) and Hirose et al. (2011) satisfy these conditions when scaled appropriately to yield softly decaying penalties. The authors establish consistency and, under stronger identification, -consistency and asymptotic normality for MSPL estimators, demonstrating that soft penalties can avoid improper boundary solutions without compromising ML-type inference. Through extensive simulations and real-data applications, MSPL improves finite-sample performance, stabilises factor loading estimates and communality estimates, and yields more reliable model selection compared with naive penalisation or unpenalised ML. The framework provides a principled approach to mitigate Heywood cases while maintaining the desirable properties of ML estimation in factor analysis.

Abstract

Estimation in exploratory factor analysis often yields estimates on the boundary of the parameter space. Such occurrences, known as Heywood cases, are characterised by non-positive variance estimates and can cause issues in numerical optimisation procedures or convergence failures, which, in turn, can lead to misleading inferences, particularly regarding factor scores and model selection. We derive sufficient conditions on the model and a penalty to the log-likelihood function that i) guarantee the existence of maximum penalised likelihood estimates in the interior of the parameter space, and ii) ensure that the corresponding estimators possess the desirable asymptotic properties expected by the maximum likelihood estimator, namely consistency and asymptotic normality. Consistency and asymptotic normality are achieved when the penalisation is soft enough, in a way that adapts to the information accumulation about the model parameters. We formally show, for the first time, that the penalties of Akaike (1987) and Hirose et al. (2011) to the log-likelihood of the normal linear factor model satisfy the conditions for existence, and, hence, deal with Heywood cases. Their vanilla versions, though, can result in questionable finite-sample properties in estimation, inference, and model selection. The maximum softly-penalised likelihood framework we introduce enables the careful scaling of those penalties to ensure that the resulting estimation and inference procedures inherit the ML estimator's optimal properties. Through comprehensive simulation studies and the analysis of real data sets, we illustrate the desirable finite-sample properties of the maximum softly penalised likelihood estimators and associated procedures.

Paper Structure

This paper contains 12 sections, 3 theorems, 13 equations, 3 figures, 3 tables.

Key Result

Theorem 4.1

Let $\boldsymbol{\Theta} = \left\{\boldsymbol{\theta} \in \Re^{p (q + 1)}: \theta_m > 0, m > pq \right\}$ and $\partial \boldsymbol{\Theta} = \{\boldsymbol{\theta} \in \Re^{p(q + 1)}: \exists m > pq, \theta_m = 0 \}$ and $\boldsymbol{\Sigma}(\boldsymbol{\theta}) = \boldsymbol{\Lambda}(\boldsymbol{\t Then, the set of MPL estimates $\arg \max_{\boldsymbol{\theta} \in \boldsymbol{\Theta}} \ell^*(\bol

Figures (3)

  • Figure 1: Percentage of samples (out of $1000$) that have been identified as Heywood cases for ML ("None"), MPL with Akaike[$n$] and Hirose[$n$] penalties, and MSPL with Akaike[$n^{-1/2}$] and Hirose[$n^{-1/2}$] penalties, $n \in \left\{50,100,400\right\}$, and loading matrix settings $A_3$, $B_3$, $A_5$, $B_5$, $A_8$, and $B_8$.
  • Figure 2: Violin plots of estimates of $\log(|\mathrm{Bias}|)$ (top panel), $\log(\mathrm{RMSE})$ (middle panel) and probability of underestimation (bottom panel) for the elements of $\boldsymbol{\Lambda}\boldsymbol{\Lambda}^\top$, for each estimator, $n \in \left\{50,100,400\right\}$, and loading matrix settings $A_3$ and $B_3$. The average over all elements for each setting is noted with a dot.
  • Figure 3: Percentage of times the model with $3$ factors is selected for each estimator, $n \in \{50, 400, 1000, 5000 \}$, and loading matrix settings $A_3$ and $B_3$, using AIC and BIC. The absence of vertical bars pertaining to the Akaike[$n$] and Hirose[$n$] based model selection procedures indicate that these methods have never selected the correct model.

Theorems & Definitions (3)

  • Theorem 4.1: Existence of MPL estimates in factor analysis
  • Theorem 5.1
  • Theorem 5.2