Table of Contents
Fetching ...

Iterative Exploration-Driven Sparse SDP Clustering via Thompson Sampling

Jongmin Mun, Paromita Dubey, Yingying Fan

TL;DR

This paper studies high-dimensional sparse clustering, a combinatorial NP-hard problem arising from the bilinear coupling between cluster assignment and feature selection, and proposes a block-coordinate ascent framework that alternates between SDP-based clustering and non-conservative feature selection.

Abstract

This paper studies high-dimensional sparse clustering, a combinatorial NP-hard problem arising from the bilinear coupling between cluster assignment and feature selection. We analyze semidefinite programming (SDP) relaxations of $K$-means and establish minimax separation bounds, demonstrating that these relaxations are theoretically robust to feature over-selection: exact recovery is preserved even in the presence of non-informative features. Leveraging this robustness, we propose a block-coordinate ascent framework that alternates between SDP-based clustering and non-conservative feature selection. To address the tendency of deterministic greedy methods to become trapped in local optima, we formulate the feature selection step as a Thompson sampling bandit problem. This approach introduces adaptive memory by aggregating historical variable-selection outcomes into posterior distributions, and selects features via posterior sampling, enabling stochastic exploration that promotes the inclusion of under-explored features and facilitates escape from local maxima. We establish conditions for consistent variable selection and exact clustering recovery, and extend the method to settings with unknown covariance through a scalable, inverse-free estimation procedure. Numerical experiments demonstrate that the proposed memory-driven approach consistently outperforms state-of-the-art sparse clustering methods.

Iterative Exploration-Driven Sparse SDP Clustering via Thompson Sampling

TL;DR

This paper studies high-dimensional sparse clustering, a combinatorial NP-hard problem arising from the bilinear coupling between cluster assignment and feature selection, and proposes a block-coordinate ascent framework that alternates between SDP-based clustering and non-conservative feature selection.

Abstract

This paper studies high-dimensional sparse clustering, a combinatorial NP-hard problem arising from the bilinear coupling between cluster assignment and feature selection. We analyze semidefinite programming (SDP) relaxations of -means and establish minimax separation bounds, demonstrating that these relaxations are theoretically robust to feature over-selection: exact recovery is preserved even in the presence of non-informative features. Leveraging this robustness, we propose a block-coordinate ascent framework that alternates between SDP-based clustering and non-conservative feature selection. To address the tendency of deterministic greedy methods to become trapped in local optima, we formulate the feature selection step as a Thompson sampling bandit problem. This approach introduces adaptive memory by aggregating historical variable-selection outcomes into posterior distributions, and selects features via posterior sampling, enabling stochastic exploration that promotes the inclusion of under-explored features and facilitates escape from local maxima. We establish conditions for consistent variable selection and exact clustering recovery, and extend the method to settings with unknown covariance through a scalable, inverse-free estimation procedure. Numerical experiments demonstrate that the proposed memory-driven approach consistently outperforms state-of-the-art sparse clustering methods.

Paper Structure

This paper contains 20 sections, 7 theorems, 49 equations, 2 figures, 2 tables, 4 algorithms.

Key Result

Theorem 1

Assume that there exists a universal constant $C_1$ such that $m \geq C_1 n/\log n.$ Also assume $|G_k^\ast| \geq 2$ for all $k \in [K]$. Let $\hat{\mathbf Z}(S)$ be the solution of main:SDP_objective_submatrix corresponding to the subset $S\in \mathcal{S}$. Then where $C_3>0$ is some constant. Furthermore, under the regime of $(s \log p)/n=o(1)$ and $n \geq 2$, no clustering rule can guaran

Figures (2)

  • Figure 1: Schematic comparison of the block coordinate ascent algorithms.
  • Figure 2: Clustering accuracies under (a, b) known and (c, d) unknown covariance settings. Thick lines denote our proposed algorithms. In (a), we compare against sparse and non-sparse baselines, where dotted curves indicate non-sparse methods (representing our initialization prior to feature selection). In (b, c, d), we compare against sparsity-aware baselines. Full parameter settings are detailed in Table \ref{['main:tab:sim_summary']}.

Theorems & Definitions (12)

  • Theorem 1
  • Definition 2: Bernoulli reward
  • Remark 3
  • Lemma 4
  • proof
  • Theorem 5: Pull Bound for Noise Features
  • Theorem 6: Regret bound for oracle TVS
  • Theorem 7: Variable Selection Consistency
  • Corollary 8
  • proof
  • ...and 2 more