K-Deep Simplex: Deep Manifold Learning via Local Dictionaries
Pranay Tankala, Abiy Tasissa, James M. Murphy, Demba Ba
TL;DR
K-Deep Simplex (KDS) presents a locality-aware, simplex-constrained dictionary learning framework that represents each data point as a sparse convex combination of learned archetypes. By tying the locality of representation to a weighted $\ell_1$ regularizer and leveraging a Delaunay triangulation model, the authors establish identifiability, stability, and a closed-form dictionary update, and connect the method to spectral embedding via the coefficient covariance. The approach is implemented as an algorithmically transparent autoencoder through algorithm unrolling, enabling scalable training, interpretable representations, and competitive clustering performance on synthetic and real datasets. The work advances scalable manifold learning by learned landmarks and offers practical tooling for efficient, interpretable nonlinear dimensionality reduction and clustering with strong theoretical backing.
Abstract
We propose K-Deep Simplex(KDS) which, given a set of data points, learns a dictionary comprising synthetic landmarks, along with representation coefficients supported on a simplex. KDS employs a local weighted $\ell_1$ penalty that encourages each data point to represent itself as a convex combination of nearby landmarks. We solve the proposed optimization program using alternating minimization and design an efficient, interpretable autoencoder using algorithm unrolling. We theoretically analyze the proposed program by relating the weighted $\ell_1$ penalty in KDS to a weighted $\ell_0$ program. Assuming that the data are generated from a Delaunay triangulation, we prove the equivalence of the weighted $\ell_1$ and weighted $\ell_0$ programs. We further show the stability of the representation coefficients under mild geometrical assumptions. If the representation coefficients are fixed, we prove that the sub-problem of minimizing over the dictionary yields a unique solution. Further, we show that low-dimensional representations can be efficiently obtained from the covariance of the coefficient matrix. Experiments show that the algorithm is highly efficient and performs competitively on synthetic and real data sets.
