Table of Contents
Fetching ...

Optimal Projections for Discriminative Dictionary Learning using the JL-lemma

G. Madhuri, Atul Negi, Kaluri V. Rangarao

TL;DR

A constructive approach to derandomize the projection matrix using the Johnson-Lindenstrauss lemma, a projection matrix derived from the proposed Modified Supervised PC analysis is used and a heuristic is proposed to decide the data perturbation levels and the dictionary atom's corresponding suitable description length.

Abstract

Dimensionality reduction-based dictionary learning methods in the literature have often used iterative random projections. The dimensionality of such a random projection matrix is a random number that might not lead to a separable subspace structure in the transformed space. The convergence of such methods highly depends on the initial seed values used. Also, gradient descent-based updates might result in local minima. This paper proposes a constructive approach to derandomize the projection matrix using the Johnson-Lindenstrauss lemma. Rather than reducing dimensionality via random projections, a projection matrix derived from the proposed Modified Supervised PC analysis is used. A heuristic is proposed to decide the data perturbation levels and the dictionary atom's corresponding suitable description length. The projection matrix is derived in a single step, provides maximum feature-label consistency of the transformed space, and preserves the geometry of the original data. The projection matrix thus constructed is proved to be a JL-embedding. Despite confusing classes in the OCR datasets, the dictionary trained in the transformed space generates discriminative sparse coefficients with reduced complexity. Empirical study demonstrates that the proposed method performs well even when the number of classes and dimensionality increase. Experimentation on OCR and face recognition datasets shows better classification performance than other algorithms.

Optimal Projections for Discriminative Dictionary Learning using the JL-lemma

TL;DR

A constructive approach to derandomize the projection matrix using the Johnson-Lindenstrauss lemma, a projection matrix derived from the proposed Modified Supervised PC analysis is used and a heuristic is proposed to decide the data perturbation levels and the dictionary atom's corresponding suitable description length.

Abstract

Dimensionality reduction-based dictionary learning methods in the literature have often used iterative random projections. The dimensionality of such a random projection matrix is a random number that might not lead to a separable subspace structure in the transformed space. The convergence of such methods highly depends on the initial seed values used. Also, gradient descent-based updates might result in local minima. This paper proposes a constructive approach to derandomize the projection matrix using the Johnson-Lindenstrauss lemma. Rather than reducing dimensionality via random projections, a projection matrix derived from the proposed Modified Supervised PC analysis is used. A heuristic is proposed to decide the data perturbation levels and the dictionary atom's corresponding suitable description length. The projection matrix is derived in a single step, provides maximum feature-label consistency of the transformed space, and preserves the geometry of the original data. The projection matrix thus constructed is proved to be a JL-embedding. Despite confusing classes in the OCR datasets, the dictionary trained in the transformed space generates discriminative sparse coefficients with reduced complexity. Empirical study demonstrates that the proposed method performs well even when the number of classes and dimensionality increase. Experimentation on OCR and face recognition datasets shows better classification performance than other algorithms.
Paper Structure (18 sections, 5 theorems, 20 equations, 8 figures, 8 tables, 1 algorithm)

This paper contains 18 sections, 5 theorems, 20 equations, 8 figures, 8 tables, 1 algorithm.

Key Result

Lemma 3.1

JL-Lemma JLlemma_proof2003: Given a set of $N$ data points in $\mathbb{R}^d$ and $0< \epsilon <1$, if $p\geq \frac{12\log N}{\epsilon^2(1.5-\epsilon)}$, then there exists a map $f:\mathbb{R}^d\to \mathbb{R}^p$ such that

Figures (8)

  • Figure 4.1: Framework of proposed JLSPCADL for classification: $p$ is determined from $N$ and $\epsilon \in [0.3,0.4]$, $U$ from M-SPCA, $D,X$ using K-SVD in the transformed space $Z$, and finally the classification label using \ref{['eq:classifyrule']}.
  • Figure 4.2: (a)Lower bounds on $p$ for $\epsilon=0.4$ when the curve flattens. (b)$\frac{dp}{d\epsilon}$ vs $\epsilon$. The projection dimension of datasets is chosen at the point where the curve in (b) starts to flatten (c) If $\epsilon$ is closer to 1, then the distance between the mapped data points (solid red line ), $\|f(x_i)-f(x_j)\|_2^2, \forall i,j$, could blow up.
  • Figure 4.3: The loss function for dictionary learning, while alternatingly optimizing $D$ (fixing $X$), converges within few iterations.
  • Figure 5.1: Random choice of the number of principal components does not work well for discriminative dictionary learning in the transformed space. First row: Classification accuracy when a small sample size from each class is used to form a shared global dictionary using the proposed method. Second row: Classification accuracy for different projection dimensions on handwritten digit datasets. Third row: Classification performance on YaleB is better when the perturbation threshold, $\epsilon$, is low.
  • Figure 5.2: The inter-class similarity of UHTelPCC data (first row) and the intra-class variance of Banti data (second row) lead to confusing classes in Telugu OCR data.
  • ...and 3 more figures

Theorems & Definitions (8)

  • Lemma 3.1
  • Lemma 4.1
  • Theorem 4.2
  • Lemma 4.3
  • Lemma 4.4
  • proof
  • Claim 4.5
  • proof