Table of Contents
Fetching ...

Introducing Flexible Monotone Multiple Choice Item Response Theory Models and Bit Scales

Joakim Wallmark, Maria Josefsson, Marie Wiberg

TL;DR

This study presents a new model for multiple choice data, the monotone multiple choice (MMC) model, which is fit using autoencoders and illustrates how the latent trait scale from any fitted IRT model can be transformed into a ratio scale, aiding in score interpretation and making it easier to compare different types of IRT models.

Abstract

Item Response Theory (IRT) is a powerful statistical approach for evaluating test items and determining test taker abilities through response analysis. An IRT model that better fits the data leads to more accurate latent trait estimates. In this study, we present a new model for multiple choice data, the monotone multiple choice (MMC) model, which we fit using autoencoders. Using both simulated scenarios and real data from the Swedish Scholastic Aptitude Test, we demonstrate empirically that the MMC model outperforms the traditional nominal response IRT model in terms of fit. Furthermore, we illustrate how the latent trait scale from any fitted IRT model can be transformed into a ratio scale, aiding in score interpretation and making it easier to compare different types of IRT models. We refer to these new scales as bit scales. Bit scales are especially useful for models for which minimal or no assumptions are made for the latent trait scale distributions, such as for the autoencoder fitted models in this study.

Introducing Flexible Monotone Multiple Choice Item Response Theory Models and Bit Scales

TL;DR

This study presents a new model for multiple choice data, the monotone multiple choice (MMC) model, which is fit using autoencoders and illustrates how the latent trait scale from any fitted IRT model can be transformed into a ratio scale, aiding in score interpretation and making it easier to compare different types of IRT models.

Abstract

Item Response Theory (IRT) is a powerful statistical approach for evaluating test items and determining test taker abilities through response analysis. An IRT model that better fits the data leads to more accurate latent trait estimates. In this study, we present a new model for multiple choice data, the monotone multiple choice (MMC) model, which we fit using autoencoders. Using both simulated scenarios and real data from the Swedish Scholastic Aptitude Test, we demonstrate empirically that the MMC model outperforms the traditional nominal response IRT model in terms of fit. Furthermore, we illustrate how the latent trait scale from any fitted IRT model can be transformed into a ratio scale, aiding in score interpretation and making it easier to compare different types of IRT models. We refer to these new scales as bit scales. Bit scales are especially useful for models for which minimal or no assumptions are made for the latent trait scale distributions, such as for the autoencoder fitted models in this study.
Paper Structure (17 sections, 10 equations, 8 figures, 3 tables, 1 algorithm)

This paper contains 17 sections, 10 equations, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: Example of IRFs in a situation where the monotonicity assumption is not fulfilled.
  • Figure 2: Example of an autoencoder neural network with a 2-dimensional latent space, and an autoencoder used to fit an IRT model.
  • Figure 3: IRF to the left with its corresponding entropy curve to the right. The vertical arrows in the entropy plot show the distances added up to compute the item bit score for a test taker with a $\theta$ of 1.5.
  • Figure 4: Histograms of $\theta$ and bit scale score distributions for models fitted to the Swedish SAT.
  • Figure 5: Kernel density estimated residual distributions for 3 different IRT models. The left plot shows the grouped standardized residuals, while the right plot shows the grouped non-standardized residuals.
  • ...and 3 more figures