Table of Contents
Fetching ...

A statistical framework for dynamic cognitive diagnosis in digital learning environments

Yawen Ma, Anastasia Ushakova, Kate Cain, Gabriel Wallin

Abstract

Reading is foundational for educational, employment, and economic outcomes, but a persistent proportion of students globally struggle to develop adequate reading skills. Some countries promote digital tools to support reading development, alongside regular classroom instruction. Such tools generate rich log data capturing students' behaviour and performance. This study proposes a dynamic cognitive diagnostic modeling (CDM) framework based on restricted latent class models to trace students' time-varying skills mastery using log files from digital tools. Unlike traditional CDMs that require expert-defined skill-item mappings (Q-matrix), our approach jointly estimates the Q-matrix and latent skill profiles, integrates log-derived covariates (e.g., reattempts, response times, counts of mastered items) and individual characteristics, and models transitions in mastery using a Bayesian estimation approach. Applied to real-world data, the model demonstrates practical value in educational settings by effectively uncovering individual skill profiles and the skill-item mappings. Simulation studies confirm robust recovery of Q-matrix structures and latent profiles with high accuracy under varied sample sizes, item counts and different sparsity of Q-matrices. The framework offers a data-driven, time-dependent restricted latent class modeling approach to understanding early reading development.

A statistical framework for dynamic cognitive diagnosis in digital learning environments

Abstract

Reading is foundational for educational, employment, and economic outcomes, but a persistent proportion of students globally struggle to develop adequate reading skills. Some countries promote digital tools to support reading development, alongside regular classroom instruction. Such tools generate rich log data capturing students' behaviour and performance. This study proposes a dynamic cognitive diagnostic modeling (CDM) framework based on restricted latent class models to trace students' time-varying skills mastery using log files from digital tools. Unlike traditional CDMs that require expert-defined skill-item mappings (Q-matrix), our approach jointly estimates the Q-matrix and latent skill profiles, integrates log-derived covariates (e.g., reattempts, response times, counts of mastered items) and individual characteristics, and models transitions in mastery using a Bayesian estimation approach. Applied to real-world data, the model demonstrates practical value in educational settings by effectively uncovering individual skill profiles and the skill-item mappings. Simulation studies confirm robust recovery of Q-matrix structures and latent profiles with high accuracy under varied sample sizes, item counts and different sparsity of Q-matrices. The framework offers a data-driven, time-dependent restricted latent class modeling approach to understanding early reading development.

Paper Structure

This paper contains 40 sections, 27 equations, 5 figures, 26 tables.

Figures (5)

  • Figure 1: The hierarchical structure of the log files. The left column shows the full structure of Boost Reading (skill families, games, levels, and attempts). The right column highlights the subset selected for analysis, including two skill families, one game from each, and relevant levels and attempts.
  • Figure 2: The relationships between covariates $Z$, latent variables $\alpha$, and responses $Y$ across three time points.
  • Figure 3: Top ten three-level combinations engaged by students in each game and year. For example, panel A shows the most frequent level sequences for decoding game in Year 1. Each bar represents the number of students who completed the corresponding three-level combination.
  • Figure 4: Root Mean Squared Errors (RMSE) of item-level guessing and slipping parameter estimates. Panels A and B correspond to the guessing parameters under conditions with $(N=200, J=6)$ and $(N=600, J=30)$, respectively. Panels C and D show the slipping parameters under the same conditions.
  • Figure 5: RMSE of item-level guessing and slipping parameter estimates under different $\theta$ values. Panels A and B show guessing RMSEs for $\theta=0.5$ and $\theta=0.7$, respectively. Panels C and D show slipping RMSEs under the same conditions.