Reliability-Targeted Simulation of Item Response Data: Solving the Inverse Design Problem
JoonHo Lee
TL;DR
This paper tackles the gap that marginal reliability is rarely treated as an explicit design factor in IRT simulations. It formalizes an inverse design framework where a single global discrimination scale $c$ is calibrated to achieve a pre-specified target reliability, enabling control over data informativeness while preserving realistic item pools and latent distributions. Two complementary algorithms, Empirical Quadrature Calibration (EQC) and Stochastic Approximation Calibration (SAC), are introduced and validated across 960 conditions, showing EQC achieves essentially exact calibration for the average-information reliability $\tilde{\rho}$ while SAC provides unbiased, though noisier, calibration and can directly target the MSEM-based reliability $\bar{w}$. The study also clarifies the theoretical distinction between $\tilde{\rho}$ and $\bar{w}$ through Jensen’s inequality, which implies different calibration scales. An open-source R package, IRTsimrel, implements these methods, promoting routine reliability targeting and reporting to enhance reproducibility and cross-study comparability in IRT simulation research.
Abstract
Monte Carlo simulations are the primary methodology for evaluating Item Response Theory (IRT) methods, yet marginal reliability - the fundamental metric of data informativeness - is rarely treated as an explicit design factor. Unlike in multilevel modeling where the intraclass correlation (ICC) is routinely manipulated, IRT studies typically treat reliability as an incidental outcome, creating a "reliability omission" that obscures the signal-to-noise ratio of generated data. To address this gap, we introduce a principled framework for reliability-targeted simulation, transforming reliability from an implicit by-product into a precise input parameter. We formalize the inverse design problem, solving for a global discrimination scaling factor that uniquely achieves a pre-specified target reliability. Two complementary algorithms are proposed: Empirical Quadrature Calibration (EQC) for rapid, deterministic precision, and Stochastic Approximation Calibration (SAC) for rigorous stochastic estimation. A comprehensive validation study across 960 conditions demonstrates that EQC achieves essentially exact calibration, while SAC remains unbiased across non-normal latent distributions and empirical item pools. Furthermore, we clarify the theoretical distinction between average-information and error-variance-based reliability metrics, showing they require different calibration scales due to Jensen's inequality. An accompanying open-source R package, IRTsimrel, enables researchers to standardize reliability as a controlled experimental input.
