Table of Contents
Fetching ...

Grade of membership analysis for multi-layer ordinal categorical data

Huan Qing

TL;DR

A new model, multi-layer GoM, is proposed, which extends GoM to multi-layer ordinal categorical data and establishes GoM-DSoG's per-subject convergence rate under the multi-layer GoM model, based on a debiased sum of Gram matrices.

Abstract

Consider a group of individuals (subjects) participating in the same psychological tests with numerous questions (items) at different times, where the choices of each item have an implicit ordering. The observed responses can be recorded in multiple response matrices over time, named multi-layer ordinal categorical data, where layers refer to time points. Assuming that each subject has a common mixed membership shared across all layers, enabling it to be affiliated with multiple latent classes with varying weights, the objective of the grade of membership (GoM) analysis is to estimate these mixed memberships from the data. When the test is conducted only once, the data becomes traditional single-layer ordinal categorical data. The GoM model is a popular choice for describing single-layer categorical data with a latent mixed membership structure. However, GoM cannot handle multi-layer ordinal categorical data. In this work, we propose a new model, multi-layer GoM, which extends GoM to multi-layer ordinal categorical data. To estimate the common mixed memberships, we propose a new approach, GoM-DSoG, based on a debiased sum of Gram matrices. We establish GoM-DSoG's per-subject convergence rate under the multi-layer GoM model. Our theoretical results suggest that fewer no-responses, more subjects, more items, and more layers are beneficial for GoM analysis. We also propose an approach to select the number of latent classes. Extensive experimental studies verify the theoretical findings and show GoM-DSoG's superiority over its competitors, as well as the accuracy of our method in determining the number of latent classes.

Grade of membership analysis for multi-layer ordinal categorical data

TL;DR

A new model, multi-layer GoM, is proposed, which extends GoM to multi-layer ordinal categorical data and establishes GoM-DSoG's per-subject convergence rate under the multi-layer GoM model, based on a debiased sum of Gram matrices.

Abstract

Consider a group of individuals (subjects) participating in the same psychological tests with numerous questions (items) at different times, where the choices of each item have an implicit ordering. The observed responses can be recorded in multiple response matrices over time, named multi-layer ordinal categorical data, where layers refer to time points. Assuming that each subject has a common mixed membership shared across all layers, enabling it to be affiliated with multiple latent classes with varying weights, the objective of the grade of membership (GoM) analysis is to estimate these mixed memberships from the data. When the test is conducted only once, the data becomes traditional single-layer ordinal categorical data. The GoM model is a popular choice for describing single-layer categorical data with a latent mixed membership structure. However, GoM cannot handle multi-layer ordinal categorical data. In this work, we propose a new model, multi-layer GoM, which extends GoM to multi-layer ordinal categorical data. To estimate the common mixed memberships, we propose a new approach, GoM-DSoG, based on a debiased sum of Gram matrices. We establish GoM-DSoG's per-subject convergence rate under the multi-layer GoM model. Our theoretical results suggest that fewer no-responses, more subjects, more items, and more layers are beneficial for GoM analysis. We also propose an approach to select the number of latent classes. Extensive experimental studies verify the theoretical findings and show GoM-DSoG's superiority over its competitors, as well as the accuracy of our method in determining the number of latent classes.
Paper Structure (19 sections, 5 theorems, 22 equations, 11 figures, 1 table, 1 algorithm)

This paper contains 19 sections, 5 theorems, 22 equations, 11 figures, 1 table, 1 algorithm.

Key Result

Proposition 1

(Identifiability). Assume that $\mathrm{rank}(\sum_{l=1}^{L}\Theta'_{l}\Theta_{l})=K$. Then, our multi-layer GoM model is identifiable: For any $(\Pi, \{\Theta_{l}\}^{L}_{l=1})$ and $(\tilde{\Pi}, \{\tilde{\Theta}_{l}\}^{L}_{l=1})$, if $\Pi\Theta'_{l}=\tilde{\Pi}\tilde{\Theta}'_{l}$ for $l\in[L]$, w

Figures (11)

  • Figure 1: A toy example of multi-layer categorical data with 12 subjects, 10 items, 3 layers, and 6 choices per item, where we use S$i$, I$j$, $R_{l}$ to represent the $i$-th subject, the $j$-th item, and the $l$-th observed response matrix, respectively, for $i=1,2,\ldots,12, j=1,2,\ldots,10$, and $l=1,2,3$.
  • Figure 2: Illustration of the Ideal Simplex geometry of the eigenvector matrix $U$ with $K=3$. Each point denotes a row vector of $U$, red points denote rows corresponding to pure subjects (because $U(i,:)=U(j,:)$ for any two pure subjects $i$ and $j$ from the same latent class, so each red point may denote many pure subjects), each black points corresponds to mixed subjects, and the red triangle denotes the Ideal Simplex. For visualization, these points have been projected from $\mathbb{R}^{3}$ to $\mathbb{R}^{2}$.
  • Figure 3: Experiment 1.
  • Figure 4: Experiment 2.
  • Figure 5: Experiment 3.
  • ...and 6 more figures

Theorems & Definitions (14)

  • Definition 1
  • Example 1
  • Proposition 1
  • Lemma 1
  • Remark 1
  • Theorem 1
  • Lemma 2
  • Remark 2
  • proof
  • proof
  • ...and 4 more