Table of Contents
Fetching ...

Sketch and shift: a robust decoder for compressive clustering

Ayoub Belhadji, Rémi Gribonval

TL;DR

This work analyzes the weaknesses of the CL-OMPR decoder in compressive clustering and connects its failures to the optimization dynamics of the correlation function that approximates KDE. It then introduces a robust decoder inspired by mean shift, incorporating a sketched mean shift variant and a fitted $k$-Gaussian model to better handle intra-cluster variance. Empirical results on synthetic data and MNIST-based features show the proposed method recovers centroids with much smaller sketches and greater robustness across bandwidth parameters, outperforming CL-OMPR especially at moderate sketch sizes. The study demonstrates that compressive clustering can be made practical and tunable for memory-constrained regimes by leveraging mean-shift-inspired search and Gaussian-mixture fitting, with potential extensions to high-dimensional and privacy-preserving applications.

Abstract

Compressive learning is an emerging approach to drastically reduce the memory footprint of large-scale learning, by first summarizing a large dataset into a low-dimensional sketch vector, and then decoding from this sketch the latent information needed for learning. In light of recent progress on information preservation guarantees for sketches based on random features, a major objective is to design easy-to-tune algorithms (called decoders) to robustly and efficiently extract this information. To address the underlying non-convex optimization problems, various heuristics have been proposed. In the case of compressive clustering, the standard heuristic is CL-OMPR, a variant of sliding Frank-Wolfe. Yet, CL-OMPR is hard to tune, and the examination of its robustness was overlooked. In this work, we undertake a scrutinized examination of CL-OMPR to circumvent its limitations. In particular, we show how this algorithm can fail to recover the clusters even in advantageous scenarios. To gain insight, we show how the deficiencies of this algorithm can be attributed to optimization difficulties related to the structure of a correlation function appearing at core steps of the algorithm. To address these limitations, we propose an alternative decoder offering substantial improvements over CL-OMPR. Its design is notably inspired from the mean shift algorithm, a classic approach to detect the local maxima of kernel density estimators. The proposed algorithm can extract clustering information from a sketch of the MNIST dataset that is 10 times smaller than previously.

Sketch and shift: a robust decoder for compressive clustering

TL;DR

This work analyzes the weaknesses of the CL-OMPR decoder in compressive clustering and connects its failures to the optimization dynamics of the correlation function that approximates KDE. It then introduces a robust decoder inspired by mean shift, incorporating a sketched mean shift variant and a fitted -Gaussian model to better handle intra-cluster variance. Empirical results on synthetic data and MNIST-based features show the proposed method recovers centroids with much smaller sketches and greater robustness across bandwidth parameters, outperforming CL-OMPR especially at moderate sketch sizes. The study demonstrates that compressive clustering can be made practical and tunable for memory-constrained regimes by leveraging mean-shift-inspired search and Gaussian-mixture fitting, with potential extensions to high-dimensional and privacy-preserving applications.

Abstract

Compressive learning is an emerging approach to drastically reduce the memory footprint of large-scale learning, by first summarizing a large dataset into a low-dimensional sketch vector, and then decoding from this sketch the latent information needed for learning. In light of recent progress on information preservation guarantees for sketches based on random features, a major objective is to design easy-to-tune algorithms (called decoders) to robustly and efficiently extract this information. To address the underlying non-convex optimization problems, various heuristics have been proposed. In the case of compressive clustering, the standard heuristic is CL-OMPR, a variant of sliding Frank-Wolfe. Yet, CL-OMPR is hard to tune, and the examination of its robustness was overlooked. In this work, we undertake a scrutinized examination of CL-OMPR to circumvent its limitations. In particular, we show how this algorithm can fail to recover the clusters even in advantageous scenarios. To gain insight, we show how the deficiencies of this algorithm can be attributed to optimization difficulties related to the structure of a correlation function appearing at core steps of the algorithm. To address these limitations, we propose an alternative decoder offering substantial improvements over CL-OMPR. Its design is notably inspired from the mean shift algorithm, a classic approach to detect the local maxima of kernel density estimators. The proposed algorithm can extract clustering information from a sketch of the MNIST dataset that is 10 times smaller than previously.
Paper Structure (20 sections, 16 equations, 10 figures, 3 algorithms)

This paper contains 20 sections, 16 equations, 10 figures, 3 algorithms.

Figures (10)

  • Figure 1: Average MSE of Lloyd's algorithm, CL-OMPR for two sketch sizes, and the proposed \ref{['alg:d_CLOMP']} on a synthetic dataset with three well-separated clusters in dimension $2$. For well chosen values of $\sigma$ the MSE is of the order of 0.01, to be compared with inter-cluster squared distances of the order of 0.25, and intra-cluster variances of the order of 0.05.
  • Figure 2: The correlation function for $m=100$ (top) versus $m=1000$ (bottom).
  • Figure 3: A comparison between plain gradient ascent and sketched mean shift in the identification of a local maximum of $f_{z_{\mathcal{X}}}$; dynamic ranges of $\|\nabla f_{z_{\mathcal{X}}}\|_{2}$ and $\|\nabla f_{z_{\mathcal{X}}}/ f_{z_{\mathcal{X}}}\|_{2}$.
  • Figure 4: The correlation function $f_{r}$ associated to the residual $r$ at the first and second iteration of \ref{['alg:d_CLOMP']} using two fitted models: mixture of Diracs and mixture of Gaussians. The red points correspond to the selected centroids.
  • Figure 5: Comparison of CL-OMPR and \ref{['alg:d_CLOMP']} with three synthetic clusters in $[-1,1]^6$
  • ...and 5 more figures

Theorems & Definitions (3)

  • Definition 1
  • Remark 1
  • Definition 2