Table of Contents
Fetching ...

A Deep Learning Algorithm for High-Dimensional Exploratory Item Factor Analysis

Christopher J. Urban, Daniel J. Bauer

TL;DR

A deep learning-based VI algorithm for exploratory item factor analysis (IFA) that is computationally fast even in large data sets with many latent factors and recovers results aligning with psychological theory across random starts is investigated.

Abstract

Marginal maximum likelihood (MML) estimation is the preferred approach to fitting item response theory models in psychometrics due to the MML estimator's consistency, normality, and efficiency as the sample size tends to infinity. However, state-of-the-art MML estimation procedures such as the Metropolis-Hastings Robbins-Monro (MH-RM) algorithm as well as approximate MML estimation procedures such as variational inference (VI) are computationally time-consuming when the sample size and the number of latent factors are very large. In this work, we investigate a deep learning-based VI algorithm for exploratory item factor analysis (IFA) that is computationally fast even in large data sets with many latent factors. The proposed approach applies a deep artificial neural network model called an importance-weighted autoencoder (IWAE) for exploratory IFA. The IWAE approximates the MML estimator using an importance sampling technique wherein increasing the number of importance-weighted (IW) samples drawn during fitting improves the approximation, typically at the cost of decreased computational efficiency. We provide a real data application that recovers results aligning with psychological theory across random starts. Via simulation studies, we show that the IWAE yields more accurate estimates as either the sample size or the number of IW samples increases (although factor correlation and intercepts estimates exhibit some bias) and obtains similar results to MH-RM in less time. Our simulations also suggest that the proposed approach performs similarly to and is potentially faster than constrained joint maximum likelihood estimation, a fast procedure that is consistent when the sample size and the number of items simultaneously tend to infinity.

A Deep Learning Algorithm for High-Dimensional Exploratory Item Factor Analysis

TL;DR

A deep learning-based VI algorithm for exploratory item factor analysis (IFA) that is computationally fast even in large data sets with many latent factors and recovers results aligning with psychological theory across random starts is investigated.

Abstract

Marginal maximum likelihood (MML) estimation is the preferred approach to fitting item response theory models in psychometrics due to the MML estimator's consistency, normality, and efficiency as the sample size tends to infinity. However, state-of-the-art MML estimation procedures such as the Metropolis-Hastings Robbins-Monro (MH-RM) algorithm as well as approximate MML estimation procedures such as variational inference (VI) are computationally time-consuming when the sample size and the number of latent factors are very large. In this work, we investigate a deep learning-based VI algorithm for exploratory item factor analysis (IFA) that is computationally fast even in large data sets with many latent factors. The proposed approach applies a deep artificial neural network model called an importance-weighted autoencoder (IWAE) for exploratory IFA. The IWAE approximates the MML estimator using an importance sampling technique wherein increasing the number of importance-weighted (IW) samples drawn during fitting improves the approximation, typically at the cost of decreased computational efficiency. We provide a real data application that recovers results aligning with psychological theory across random starts. Via simulation studies, we show that the IWAE yields more accurate estimates as either the sample size or the number of IW samples increases (although factor correlation and intercepts estimates exhibit some bias) and obtains similar results to MH-RM in less time. Our simulations also suggest that the proposed approach performs similarly to and is potentially faster than constrained joint maximum likelihood estimation, a fast procedure that is consistent when the sample size and the number of items simultaneously tend to infinity.

Paper Structure

This paper contains 29 sections, 35 equations, 12 figures, 1 table, 1 algorithm.

Figures (12)

  • Figure 1: Schematic representation of a feedforward neural network with a single hidden layer. The input layer is a $3 \times 1$ vector, the hidden layer is a $4 \times 1$ vector, and the output layer is a $2 \times 1$ vector. Case subscripts $i$ are omitted to avoid clutter.
  • Figure 2: Schematic diagram of a variational autoencoder for item factor analysis with $J = 6$ items, $C_j = 2$ categories per item, $P = 2$ factors, $S = 1$ Monte Carlo sample from the approximate latent variable posterior, and an inference model consisting of a feedforward neural network with a single hidden layer. The reparameterization trick is not illustrated for simplicity. LV = latent variable.
  • Figure 3: Scree plot of predicted approximate negative log-likelihood as a function of the number of latent factors. The "elbow" at $5$ factors is marked with a dotted line.
  • Figure 4: Heat map of factor loadings for IPIP-FFM items. EXT = extraversion, EST = emotional stability, AGR = agreeableness, CON = conscientiousness, OPN = openness.
  • Figure 5: Parameter bias for amortized importance-weighted variational inference (IWVI) computed based on $100$ replications of simulation. Three settings for the number of importance-weighted (IW) samples are compared.
  • ...and 7 more figures