Latent class analysis for multi-layer categorical data

Huan Qing

Latent class analysis for multi-layer categorical data

Huan Qing

TL;DR

This work extends latent class analysis to multi-layer categorical data with polytomous responses by introducing the multi-layer latent class model (multi-layer LCM) and three scalable spectral estimators (LCA-SoR, LCA-DSoG, LCA-SoG). The methods rely on aggregating layer-wise responses via $R_{sum}$, $S_{sum}$, and the debiased $\tilde S_{sum}$ to recover the latent-class assignments through K-means on the leading singular/eigenvectors, with $\Theta_l$ recovered from the estimated latent structure. The authors prove estimation consistency under mild sparsity conditions, show that more layers and a debiased Gram approach improve accuracy (with LCA-DSoG typically performing best), and propose a modularity-based criterion to select the number of latent classes. Experimental results corroborate the theory, demonstrating improved latent-class recovery and robust $K$-estimation in multi-layer polytomous data, with practical implications for psychology, education, and survey research.

Abstract

Traditional categorical data, often collected in psychological tests and educational assessments, are typically single-layer and gathered only once.This paper considers a more general case, multi-layer categorical data with polytomous responses. To model such data, we present a novel statistical model, the multi-layer latent class model (multi-layer LCM). This model assumes that all layers share common subjects and items. To discover subjects' latent classes and other model parameters under this model, we develop three efficient spectral methods based on the sum of response matrices, the sum of Gram matrices, and the debiased sum of Gram matrices, respectively. Within the framework of multi-layer LCM, we demonstrate the estimation consistency of these methods under mild conditions regarding data sparsity. Our theoretical findings reveal two key insights: (1) increasing the number of layers can enhance the performance of the proposed methods, highlighting the advantages of considering multiple layers in latent class analysis; (2) we theoretically show that the algorithm based on the debiased sum of Gram matrices usually performs best. Additionally, we propose an approach that combines the averaged modularity metric with our methods to determine the number of latent classes. Extensive experiments are conducted to support our theoretical results and show the powerfulness of our methods in the task of learning latent classes and estimating the number of latent classes in multi-layer categorical data with polytomous responses.

Latent class analysis for multi-layer categorical data

TL;DR

, and the debiased

to recover the latent-class assignments through K-means on the leading singular/eigenvectors, with

recovered from the estimated latent structure. The authors prove estimation consistency under mild sparsity conditions, show that more layers and a debiased Gram approach improve accuracy (with LCA-DSoG typically performing best), and propose a modularity-based criterion to select the number of latent classes. Experimental results corroborate the theory, demonstrating improved latent-class recovery and robust

-estimation in multi-layer polytomous data, with practical implications for psychology, education, and survey research.

Abstract

Paper Structure (12 sections, 5 theorems, 21 equations, 4 figures, 2 algorithms)

This paper contains 12 sections, 5 theorems, 21 equations, 4 figures, 2 algorithms.

Introduction
Model
Algorithms
Main results
Estimating the number of latent classes
Experimental studies
Conclusion and future work
Proofs of theoretical results
Proof of Lemma \ref{['SVDEigenDecomposition']}
Proof of Lemma \ref{['boundSumDeSumMLLCM']}
Proof of Theorem \ref{['mainMLLCM']}
Proof of Lemma \ref{['CompareSoRDSoG']}

Key Result

Lemma 1

For a multi-layer LCM parameterized by $(Z,\{\Theta_{l}\}^{L}_{l=1})$, for $l\in[L], k\neq \tilde{k}, k\in[K], \tilde{k}\in[K]$, we have the following conclusions:

Figures (4)

Figure 1: A simple example of multi-layer categorical data with 10 subjects, 5 items, and 3 layers. Here, S$i$ denotes subject $i$ for $i\in\{1,2,\ldots,10\}$, I$j$ denotes item $j$ for $j\in\{1,2,\ldots,5\}$, and $R_{l}\in\{0,1,2,\ldots,5\}^{10\times 5}$ denotes the $l$-th response matrix for $l\in\{1,2,3\}$.
Figure 2: Numerical results of Experiment 1.
Figure 3: Numerical results of Experiment 2.
Figure 4: Numerical results of Experiment 3.

Theorems & Definitions (11)

Remark 1
Lemma 1
Remark 2
Lemma 2
Theorem 1
Lemma 3
Corollary 1
proof
proof
proof
...and 1 more

Latent class analysis for multi-layer categorical data

TL;DR

Abstract

Latent class analysis for multi-layer categorical data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (11)