Table of Contents
Fetching ...

On the Identifiability of Diagnostic Classification Models

Guanhua Fang, Jingchen Liu, Zhiliang Ying

TL;DR

The paper develops a general identifiability theory for diagnostic classification models within a latent class framework, addressing identifiability of item parameters, attribute distributions, and the item-specific partial information structure encoded by the Q-matrix. It leverages Kruskal-type arguments and T-matrix rank conditions to derive sufficient identifiability results and introduces a Bayesian estimation framework with a Dirichlet process–style stick-breaking prior to recover an unbounded latent class structure; partial information is inferred via clustering of item-response probabilities. Through extensive simulations across NIDA, NC-RUM, and LCDM settings and a real NESARC Social Phobia analysis, the authors demonstrate consistent estimation and successful reconstruction of qualitative item-attribute relationships, supporting reliable diagnostic inference. The work provides a unified, model-parameter-agnostic foundation for identifiability in DCMs and offers practical estimation tools for uncovering the underlying attribute structure without pre-specifying the number of latent classes.

Abstract

This paper establishes fundamental results for statistical inference of diagnostic classification models (DCM). The results are developed at a high level of generality, applicable to essentially all diagnostic classification models. In particular, we establish identifiability results of various modeling parameters, notably item response probabilities, attribute distribution, and Q-matrix-induced partial information structure. Consistent estimators are constructed. Simulation results show that these estimators perform well under various modeling settings. We also use a real example to illustrate the new method. The results are stated under the setting of general latent class models. For DCM with a specific parameterization, the conditions may be adapted accordingly.

On the Identifiability of Diagnostic Classification Models

TL;DR

The paper develops a general identifiability theory for diagnostic classification models within a latent class framework, addressing identifiability of item parameters, attribute distributions, and the item-specific partial information structure encoded by the Q-matrix. It leverages Kruskal-type arguments and T-matrix rank conditions to derive sufficient identifiability results and introduces a Bayesian estimation framework with a Dirichlet process–style stick-breaking prior to recover an unbounded latent class structure; partial information is inferred via clustering of item-response probabilities. Through extensive simulations across NIDA, NC-RUM, and LCDM settings and a real NESARC Social Phobia analysis, the authors demonstrate consistent estimation and successful reconstruction of qualitative item-attribute relationships, supporting reliable diagnostic inference. The work provides a unified, model-parameter-agnostic foundation for identifiability in DCMs and offers practical estimation tools for uncovering the underlying attribute structure without pre-specifying the number of latent classes.

Abstract

This paper establishes fundamental results for statistical inference of diagnostic classification models (DCM). The results are developed at a high level of generality, applicable to essentially all diagnostic classification models. In particular, we establish identifiability results of various modeling parameters, notably item response probabilities, attribute distribution, and Q-matrix-induced partial information structure. Consistent estimators are constructed. Simulation results show that these estimators perform well under various modeling settings. We also use a real example to illustrate the new method. The results are stated under the setting of general latent class models. For DCM with a specific parameterization, the conditions may be adapted accordingly.

Paper Structure

This paper contains 17 sections, 7 theorems, 47 equations, 17 tables.

Key Result

Theorem 1

We consider the general setting of a latent class model with $M>2$ latent classes. The responses are binary and take values in $\{0,1\}$. For each item $j$, let $p_{j\boldsymbol \alpha} = P(Y_j= 1 | \boldsymbol \alpha)$. Let $\pi_{\boldsymbol \alpha}$ be the probability of each latent class. Suppose Then, the item parameters $p_{j\boldsymbol \alpha}$ and the latent class population $\pi_{\boldsymb

Theorems & Definitions (13)

  • Example 1: DINA model, Junker
  • Example 2: NIDA model
  • Example 3: Reduced NC-RUM model
  • Example 4: DINO model
  • Example 5: C-RUM model
  • Definition 1
  • Theorem 1
  • Corollary 1
  • Theorem 2
  • Theorem 3
  • ...and 3 more