Table of Contents
Fetching ...

Multidimensional Item Response Theory in the Style of Collaborative Filtering

Yoav Bergner, Peter F. Halpin, Jill-Jênn Vie

TL;DR

By recasting multidimensional IRT within a collaborative-filtering framework, the paper provides a scalable predictive paradigm that subsumes Rasch, 2PL, and M2PL as special cases. It develops penalized joint maximum likelihood with $L^2$ penalties and uses two cross-validation schemes—striated and elementwise—to tune hyperparameters and select dimensionality for large, sparse data. Through simulations and real data (including a Force Concept Inventory and a MOOC dataset), it shows high-dimensional latent spaces up to $d=20$ can be learned efficiently and yield competitive predictive performance. When interpretability of latent factors is limited, the authors propose a recommender-style validation using auxiliary data, such as item-popularity during an open-book exam, to assess substantive content relationships. The results suggest this approach is computationally fast, scales to large datasets, and provides actionable insights for predictive psychometrics and educational data mining.

Abstract

This paper presents a machine learning approach to multidimensional item response theory (MIRT), a class of latent factor models that can be used to model and predict student performance from observed assessment data. Inspired by collaborative filtering, we define a general class of models that includes many MIRT models. We discuss the use of penalized joint maximum likelihood (JML) to estimate individual models and cross-validation to select the best performing model. This model evaluation process can be optimized using batching techniques, such that even sparse large-scale data can be analyzed efficiently. We illustrate our approach with simulated and real data, including an example from a massive open online course (MOOC). The high-dimensional model fit to this large and sparse dataset does not lend itself well to traditional methods of factor interpretation. By analogy to recommender-system applications, we propose an alternative "validation" of the factor model, using auxiliary information about the popularity of items consulted during an open-book exam in the course.

Multidimensional Item Response Theory in the Style of Collaborative Filtering

TL;DR

By recasting multidimensional IRT within a collaborative-filtering framework, the paper provides a scalable predictive paradigm that subsumes Rasch, 2PL, and M2PL as special cases. It develops penalized joint maximum likelihood with penalties and uses two cross-validation schemes—striated and elementwise—to tune hyperparameters and select dimensionality for large, sparse data. Through simulations and real data (including a Force Concept Inventory and a MOOC dataset), it shows high-dimensional latent spaces up to can be learned efficiently and yield competitive predictive performance. When interpretability of latent factors is limited, the authors propose a recommender-style validation using auxiliary data, such as item-popularity during an open-book exam, to assess substantive content relationships. The results suggest this approach is computationally fast, scales to large datasets, and provides actionable insights for predictive psychometrics and educational data mining.

Abstract

This paper presents a machine learning approach to multidimensional item response theory (MIRT), a class of latent factor models that can be used to model and predict student performance from observed assessment data. Inspired by collaborative filtering, we define a general class of models that includes many MIRT models. We discuss the use of penalized joint maximum likelihood (JML) to estimate individual models and cross-validation to select the best performing model. This model evaluation process can be optimized using batching techniques, such that even sparse large-scale data can be analyzed efficiently. We illustrate our approach with simulated and real data, including an example from a massive open online course (MOOC). The high-dimensional model fit to this large and sparse dataset does not lend itself well to traditional methods of factor interpretation. By analogy to recommender-system applications, we propose an alternative "validation" of the factor model, using auxiliary information about the popularity of items consulted during an open-book exam in the course.
Paper Structure (8 sections, 18 equations, 1 table, 2 algorithms)