Optimal Projections for Discriminative Dictionary Learning using the JL-lemma

G. Madhuri; Atul Negi; Kaluri V. Rangarao

Optimal Projections for Discriminative Dictionary Learning using the JL-lemma

G. Madhuri, Atul Negi, Kaluri V. Rangarao

TL;DR

A constructive approach to derandomize the projection matrix using the Johnson-Lindenstrauss lemma, a projection matrix derived from the proposed Modified Supervised PC analysis is used and a heuristic is proposed to decide the data perturbation levels and the dictionary atom's corresponding suitable description length.

Abstract

Dimensionality reduction-based dictionary learning methods in the literature have often used iterative random projections. The dimensionality of such a random projection matrix is a random number that might not lead to a separable subspace structure in the transformed space. The convergence of such methods highly depends on the initial seed values used. Also, gradient descent-based updates might result in local minima. This paper proposes a constructive approach to derandomize the projection matrix using the Johnson-Lindenstrauss lemma. Rather than reducing dimensionality via random projections, a projection matrix derived from the proposed Modified Supervised PC analysis is used. A heuristic is proposed to decide the data perturbation levels and the dictionary atom's corresponding suitable description length. The projection matrix is derived in a single step, provides maximum feature-label consistency of the transformed space, and preserves the geometry of the original data. The projection matrix thus constructed is proved to be a JL-embedding. Despite confusing classes in the OCR datasets, the dictionary trained in the transformed space generates discriminative sparse coefficients with reduced complexity. Empirical study demonstrates that the proposed method performs well even when the number of classes and dimensionality increase. Experimentation on OCR and face recognition datasets shows better classification performance than other algorithms.

Optimal Projections for Discriminative Dictionary Learning using the JL-lemma

TL;DR

Abstract

Paper Structure (18 sections, 5 theorems, 20 equations, 8 figures, 8 tables, 1 algorithm)

This paper contains 18 sections, 5 theorems, 20 equations, 8 figures, 8 tables, 1 algorithm.

Introduction
Highlights of JLSPCADL
Sparse Representation Problem
DL in reduced dimensionality space: Related work
The Johnson-Lindenstrauss Lemma
Derandomization of the Projection matrix
Proposed method: JLSPCADL
Determination of optimal $\epsilon$ and $p$
Supervised PCA
Transformation of data using Modified-SPCA (M-SPCA)
Dictionary learning in the transformed space
Proposed Classification Rule
Convergence and Complexity analysis
Experiments and Results
Discussion
...and 3 more sections

Key Result

Lemma 3.1

JL-Lemma JLlemma_proof2003: Given a set of $N$ data points in $\mathbb{R}^d$ and $0< \epsilon <1$, if $p\geq \frac{12\log N}{\epsilon^2(1.5-\epsilon)}$, then there exists a map $f:\mathbb{R}^d\to \mathbb{R}^p$ such that

Figures (8)

Figure 4.1: Framework of proposed JLSPCADL for classification: $p$ is determined from $N$ and $\epsilon \in [0.3,0.4]$, $U$ from M-SPCA, $D,X$ using K-SVD in the transformed space $Z$, and finally the classification label using \ref{['eq:classifyrule']}.
Figure 4.2: (a)Lower bounds on $p$ for $\epsilon=0.4$ when the curve flattens. (b)$\frac{dp}{d\epsilon}$ vs $\epsilon$. The projection dimension of datasets is chosen at the point where the curve in (b) starts to flatten (c) If $\epsilon$ is closer to 1, then the distance between the mapped data points (solid red line ), $\|f(x_i)-f(x_j)\|_2^2, \forall i,j$, could blow up.
Figure 4.3: The loss function for dictionary learning, while alternatingly optimizing $D$ (fixing $X$), converges within few iterations.
Figure 5.1: Random choice of the number of principal components does not work well for discriminative dictionary learning in the transformed space. First row: Classification accuracy when a small sample size from each class is used to form a shared global dictionary using the proposed method. Second row: Classification accuracy for different projection dimensions on handwritten digit datasets. Third row: Classification performance on YaleB is better when the perturbation threshold, $\epsilon$, is low.
Figure 5.2: The inter-class similarity of UHTelPCC data (first row) and the intra-class variance of Banti data (second row) lead to confusing classes in Telugu OCR data.
...and 3 more figures

Theorems & Definitions (8)

Lemma 3.1
Lemma 4.1
Theorem 4.2
Lemma 4.3
Lemma 4.4
proof
Claim 4.5
proof

Optimal Projections for Discriminative Dictionary Learning using the JL-lemma

TL;DR

Abstract

Optimal Projections for Discriminative Dictionary Learning using the JL-lemma

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (8)