Table of Contents
Fetching ...

Linear Discriminant Analysis with Gradient Optimization

Cencheng Shen, Yuexiao Dong

Abstract

Linear discriminant analysis (LDA) is a fundamental classification and dimension reduction method that achieves Bayes optimality under Gaussian mixture, but often struggles in high-dimensional settings where the covariance matrix cannot be reliably estimated. We propose LDA with gradient optimization (LDA-GO), which learns a low-rank precision matrix via scalable gradient-based optimization. The method automatically selects between a Gaussian likelihood and a cross-entropy loss using data-driven structural diagnostics, adapting to the signal structure without user tuning. The gradient computation avoids any quadratic-sized intermediate matrix, keeping the per-iteration cost linear in the number of dimensions. Theoretically, we prove several properties of the method, including the convexity of the objective functions, Bayes-optimality of the method, and a finite-sample bound of the excess error. Numerically, we conducted a variety of simulations and real data experiments to show that LDA-GO wins a majority of settings among other LDA variants, particularly in sparse-signal high-dimensional regimes.

Linear Discriminant Analysis with Gradient Optimization

Abstract

Linear discriminant analysis (LDA) is a fundamental classification and dimension reduction method that achieves Bayes optimality under Gaussian mixture, but often struggles in high-dimensional settings where the covariance matrix cannot be reliably estimated. We propose LDA with gradient optimization (LDA-GO), which learns a low-rank precision matrix via scalable gradient-based optimization. The method automatically selects between a Gaussian likelihood and a cross-entropy loss using data-driven structural diagnostics, adapting to the signal structure without user tuning. The gradient computation avoids any quadratic-sized intermediate matrix, keeping the per-iteration cost linear in the number of dimensions. Theoretically, we prove several properties of the method, including the convexity of the objective functions, Bayes-optimality of the method, and a finite-sample bound of the excess error. Numerically, we conducted a variety of simulations and real data experiments to show that LDA-GO wins a majority of settings among other LDA variants, particularly in sparse-signal high-dimensional regimes.

Paper Structure

This paper contains 23 sections, 4 theorems, 13 equations, 2 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

Both loss functions used by LDA-GO, i.e., the cross-entropy loss $\mathcal{L}_{\text{CE}}(\Sigma^{-1})$ defined in eq:loss and the Gaussian negative log-likelihood $\mathcal{L}_{\text{NLL}}(\Sigma^{-1})$, are convex in $\Sigma^{-1} \in \mathbb{S}^p_+$. $\blacktriangleleft$$\blacktriangleleft$

Figures (2)

  • Figure 1: Structural diagnostics for automatic loss selection across all 20 simulations (See Section \ref{['sec:sim']}). Each point represents one simulation's median $(r, \kappa)$ values. Dashed lines show the decision boundaries $r = 0.10$ and $\kappa = 10$. Blue circles: NLL selected; red squares: CE selected; orange diamond: borderline (mixed across replicates).
  • Figure 2: Classification error versus signal sparsity (Category C: $p = 500$, $n = 200$, $\Sigma = I$). As the number of active features decreases from 50 to 1, LDA-GO maintains low error while other methods remain substantially higher.

Theorems & Definitions (8)

  • Theorem 1: Convexity of the LDA-GO Objective
  • Theorem 2: Global Optimality of Local Minima
  • Theorem 3: Consistency
  • Theorem 4: Excess Risk Bound
  • proof
  • proof
  • proof
  • proof