Table of Contents
Fetching ...

K-Deep Simplex: Deep Manifold Learning via Local Dictionaries

Pranay Tankala, Abiy Tasissa, James M. Murphy, Demba Ba

TL;DR

K-Deep Simplex (KDS) presents a locality-aware, simplex-constrained dictionary learning framework that represents each data point as a sparse convex combination of learned archetypes. By tying the locality of representation to a weighted $\ell_1$ regularizer and leveraging a Delaunay triangulation model, the authors establish identifiability, stability, and a closed-form dictionary update, and connect the method to spectral embedding via the coefficient covariance. The approach is implemented as an algorithmically transparent autoencoder through algorithm unrolling, enabling scalable training, interpretable representations, and competitive clustering performance on synthetic and real datasets. The work advances scalable manifold learning by learned landmarks and offers practical tooling for efficient, interpretable nonlinear dimensionality reduction and clustering with strong theoretical backing.

Abstract

We propose K-Deep Simplex(KDS) which, given a set of data points, learns a dictionary comprising synthetic landmarks, along with representation coefficients supported on a simplex. KDS employs a local weighted $\ell_1$ penalty that encourages each data point to represent itself as a convex combination of nearby landmarks. We solve the proposed optimization program using alternating minimization and design an efficient, interpretable autoencoder using algorithm unrolling. We theoretically analyze the proposed program by relating the weighted $\ell_1$ penalty in KDS to a weighted $\ell_0$ program. Assuming that the data are generated from a Delaunay triangulation, we prove the equivalence of the weighted $\ell_1$ and weighted $\ell_0$ programs. We further show the stability of the representation coefficients under mild geometrical assumptions. If the representation coefficients are fixed, we prove that the sub-problem of minimizing over the dictionary yields a unique solution. Further, we show that low-dimensional representations can be efficiently obtained from the covariance of the coefficient matrix. Experiments show that the algorithm is highly efficient and performs competitively on synthetic and real data sets.

K-Deep Simplex: Deep Manifold Learning via Local Dictionaries

TL;DR

K-Deep Simplex (KDS) presents a locality-aware, simplex-constrained dictionary learning framework that represents each data point as a sparse convex combination of learned archetypes. By tying the locality of representation to a weighted regularizer and leveraging a Delaunay triangulation model, the authors establish identifiability, stability, and a closed-form dictionary update, and connect the method to spectral embedding via the coefficient covariance. The approach is implemented as an algorithmically transparent autoencoder through algorithm unrolling, enabling scalable training, interpretable representations, and competitive clustering performance on synthetic and real datasets. The work advances scalable manifold learning by learned landmarks and offers practical tooling for efficient, interpretable nonlinear dimensionality reduction and clustering with strong theoretical backing.

Abstract

We propose K-Deep Simplex(KDS) which, given a set of data points, learns a dictionary comprising synthetic landmarks, along with representation coefficients supported on a simplex. KDS employs a local weighted penalty that encourages each data point to represent itself as a convex combination of nearby landmarks. We solve the proposed optimization program using alternating minimization and design an efficient, interpretable autoencoder using algorithm unrolling. We theoretically analyze the proposed program by relating the weighted penalty in KDS to a weighted program. Assuming that the data are generated from a Delaunay triangulation, we prove the equivalence of the weighted and weighted programs. We further show the stability of the representation coefficients under mild geometrical assumptions. If the representation coefficients are fixed, we prove that the sub-problem of minimizing over the dictionary yields a unique solution. Further, we show that low-dimensional representations can be efficiently obtained from the covariance of the coefficient matrix. Experiments show that the algorithm is highly efficient and performs competitively on synthetic and real data sets.

Paper Structure

This paper contains 42 sections, 10 theorems, 39 equations, 17 figures, 17 tables, 1 algorithm.

Key Result

Proposition 1

Let $\mathbf{R} = [\mathbf{Y} \,\,\, \mathbf{A}]\in \mathcal{R}^{d\times (n+m)}$. Then,

Figures (17)

  • Figure 1: (a-c) Training from a random initialization of atoms on the two moons data set. (d) A subset of the randomly initialized atoms for MNIST-5 (digits 0, 3, 4, 6, 7) before training (black and white) and after training and clustering (color). The number of data points is $n\approx 35000$ and the number of atoms is $m=500$. (e) Degrees of vertices in the learned similarity graph. Despite being very sparse (most digits are represented using at most 5 atoms), the learned similarity graph retains enough information about the original data set that spectral clustering recovers these digits with 99% accuracy.
  • Figure 2: The blue dots indicate the atoms which generate the data points. Each black dot, denoting a data point, is a convex combination of three atoms which are vertices of the triangle the point belongs to. Note that the circumscribing circle of any triangle does not contain any additional landmark points.
  • Figure 3: Circle and two moons. Autoencoder input (first and third) and output (second and fourth), with learned atoms marked in red.
  • Figure 4: Clustering accuracy for concentric circles across $\delta,m$.
  • Figure 5: (a) Autoencoder output and learned atoms for concentric circles, separation $\delta = 0.15$.
  • ...and 12 more figures

Theorems & Definitions (29)

  • Proposition 1
  • proof
  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Lemma 1
  • proof
  • Theorem 1
  • ...and 19 more