Scalable Learning of Item Response Theory Models
Susanne Frick, Amer Krivošija, Alexander Munteanu
TL;DR
This work tackles scalable learning for Item Response Theory (IRT) models in the regime of very large $n$ examinees and $m$ items by introducing coreset-based data summarization within the standard alternating optimization framework. It leverages the close link between 2PL IRT subproblems and logistic regression, constructing provably small coresets via sensitivity sampling and leveraging-score techniques, and extends the approach to the more challenging 3PL model. The authors provide concrete sublinear coreset bounds for both 2PL and 3PL, along with an algorithmic pipeline that remains constant across iterations, yielding substantial computational savings while preserving statistical accuracy. Empirical results on synthetic data and real-world datasets (SHARE, NEPS) show significant speedups and memory reductions with only minor degradation in parameter estimates, demonstrating the practicality of scalable IRT learning for large-scale assessments and ML benchmarks. This work thus enables large-scale psychometrics and model-based evaluation tasks that were previously computationally prohibitive, and lays groundwork for applying coreset-sketching to broader IRT families and future solver improvements.
Abstract
Item Response Theory (IRT) models aim to assess latent abilities of $n$ examinees along with latent difficulty characteristics of $m$ test items from categorical data that indicates the quality of their corresponding answers. Classical psychometric assessments are based on a relatively small number of examinees and items, say a class of $200$ students solving an exam comprising $10$ problems. More recent global large scale assessments such as PISA, or internet studies, may lead to significantly increased numbers of participants. Additionally, in the context of Machine Learning where algorithms take the role of examinees and data analysis problems take the role of items, both $n$ and $m$ may become very large, challenging the efficiency and scalability of computations. To learn the latent variables in IRT models from large data, we leverage the similarity of these models to logistic regression, which can be approximated accurately using small weighted subsets called coresets. We develop coresets for their use in alternating IRT training algorithms, facilitating scalable learning from large data.
