Table of Contents
Fetching ...

High-Dimensional Tensor Discriminant Analysis with Incomplete Tensors

Elynn Chen, Yuefeng Han, Jiayu Li

TL;DR

This paper proposes a Tensor Linear Discriminant Analysis with Missing Data (Tensor LDA-MD) algorithm, which manages high-dimensional tensor predictors with missing entries by leveraging the decomposable low-rank structure of the discriminant tensor.

Abstract

Tensor classification is gaining importance across fields, yet handling partially observed data remains challenging. In this paper, we introduce a novel approach to tensor classification with incomplete data, framed within high-dimensional tensor linear discriminant analysis. Specifically, we consider a high-dimensional tensor predictor with missing observations under the Missing Completely at Random (MCR) assumption and employ the Tensor Gaussian Mixture Model (TGMM) to capture the relationship between the tensor predictor and class label. We propose a Tensor Linear Discriminant Analysis with Missing Data (Tensor LDA-MD) algorithm, which manages high-dimensional tensor predictors with missing entries by leveraging the decomposable low-rank structure of the discriminant tensor. Our work establishes convergence rates for the estimation error of the discriminant tensor with incomplete data and minimax optimal bounds for the misclassification rate, addressing key gaps in the literature. Additionally, we derive large deviation bounds for the generalized mode-wise sample covariance matrix and its inverse, which are crucial tools in our analysis and hold independent interest. Our method demonstrates excellent performance in simulations and real data analysis, even with significant proportions of missing data.

High-Dimensional Tensor Discriminant Analysis with Incomplete Tensors

TL;DR

This paper proposes a Tensor Linear Discriminant Analysis with Missing Data (Tensor LDA-MD) algorithm, which manages high-dimensional tensor predictors with missing entries by leveraging the decomposable low-rank structure of the discriminant tensor.

Abstract

Tensor classification is gaining importance across fields, yet handling partially observed data remains challenging. In this paper, we introduce a novel approach to tensor classification with incomplete data, framed within high-dimensional tensor linear discriminant analysis. Specifically, we consider a high-dimensional tensor predictor with missing observations under the Missing Completely at Random (MCR) assumption and employ the Tensor Gaussian Mixture Model (TGMM) to capture the relationship between the tensor predictor and class label. We propose a Tensor Linear Discriminant Analysis with Missing Data (Tensor LDA-MD) algorithm, which manages high-dimensional tensor predictors with missing entries by leveraging the decomposable low-rank structure of the discriminant tensor. Our work establishes convergence rates for the estimation error of the discriminant tensor with incomplete data and minimax optimal bounds for the misclassification rate, addressing key gaps in the literature. Additionally, we derive large deviation bounds for the generalized mode-wise sample covariance matrix and its inverse, which are crucial tools in our analysis and hold independent interest. Our method demonstrates excellent performance in simulations and real data analysis, even with significant proportions of missing data.

Paper Structure

This paper contains 23 sections, 14 theorems, 181 equations, 3 figures, 6 tables, 1 algorithm.

Key Result

Theorem 4.1

Suppose there exist constant $C_0 > 0$ such that $C_0^{-1} \leq \lambda_{\min}(\otimes_{m=1}^M \Sigma_m) \leq \lambda_{\max}(\otimes_{m=1}^M \Sigma_m) \leq C_0,$ where $\lambda_{\min}(\cdot)$ and $\lambda_{\max}(\cdot)$ being the smallest and largest eigenvalues of a matrix, respectively. Recall $[\ after certain number of iterations in Algorithm alg:tensorlda-tucker, the following upper bounds ho

Figures (3)

  • Figure 1: Figures (a) and (b) compare estimation errors of ${\cal B}$ (i.e. $\left\| \widehat{{\cal B}}^{\rm tucker} - {\cal B} \right\|_{\rm F} /\left\| {\cal B} \right\|_{\rm F}$) under different settings. (a) keeps signal strength $\sigma_m=1.5$ and $c = 4/5$ while increasing sample size, whereas (b) keeps sample size $n=800$ and $c=1$ while increasing signal strength.
  • Figure 2: Figures (a) and (b) compare estimation errors of loading matrices (i.e. $\max_{1 \leq m \leq M} \|\widehat{\boldsymbol{U}}_m \widehat{\boldsymbol{U}}_m^\top - \boldsymbol{U}_m \boldsymbol{U}_m^\top \|_2$) under different settings. (a) keeps signal strength $\sigma_m= 5.0$ while increasing sample size, whereas (b) keeps sample size $n= 800$ while increasing signal strength.
  • Figure 3: The plot compares square root of the degree of freedom (i.e. $\sqrt{r + \sum_{m=1}^{M} d_m r_m}$) with the estimation error $\|\widehat{{\cal B}} - {\cal B}\|_{\rm F}$. Throughout the settings, keep $n=1200$ and signal strength $\sigma_m=1.5$.

Theorems & Definitions (27)

  • Remark 1
  • Remark 2
  • Remark 3
  • Theorem 4.1: Tucker Loadings and Low-Rank Discriminant Tensor.
  • Remark 4
  • Remark 5
  • Remark 6
  • Theorem 4.2: Upper bound of misclassification rate
  • Theorem 4.3: Lower bound of misclassification rate
  • Remark 7
  • ...and 17 more