Table of Contents
Fetching ...

A Euclidean Distance Matrix Model for Convex Clustering

Zhaowei Wang, Xiaowen Liu, Qingna Li

TL;DR

This work introduces an Euclidean Distance Matrix (EDM) reformulation of convex clustering by embedding input points and centroids into a distance matrix, linking the SON framework to EDM theory. It then develops a Majorization Penalty Method (MP-EDM) to efficiently solve the resulting nonconvex EDM($r$) problem, leveraging a majorization of the rank-constrained PSD cone and a projection-based operator for convergence guarantees. The authors prove exact recovery under suitable embedding dimension and gamma settings, and demonstrate competitive clustering accuracy and improved scalability on real-world datasets compared to established methods. The approach offers a principled, scalable alternative for convex clustering with strong theoretical guarantees and practical performance benefits.

Abstract

Clustering has been one of the most basic and essential problems in unsupervised learning due to various applications in many critical fields. The recently proposed sum-of-norms (SON) model by Pelckmans et al. (2005), Lindsten et al. (2011) and Hocking et al. (2011) has received a lot of attention. The advantage of the SON model is the theoretical guarantee in terms of perfect recovery, established by Sun et al. (2018). It also provides great opportunities for designing efficient algorithms for solving the SON model. The semismooth Newton based augmented Lagrangian method by Sun et al. (2018) has demonstrated its superior performance over the alternating direction method of multipliers (ADMM) and the alternating minimization algorithm (AMA). In this paper, we propose a Euclidean distance matrix model based on the SON model. An efficient majorization penalty algorithm is proposed to solve the resulting model. Extensive numerical experiments are conducted to demonstrate the efficiency of the proposed model and the majorization penalty algorithm.

A Euclidean Distance Matrix Model for Convex Clustering

TL;DR

This work introduces an Euclidean Distance Matrix (EDM) reformulation of convex clustering by embedding input points and centroids into a distance matrix, linking the SON framework to EDM theory. It then develops a Majorization Penalty Method (MP-EDM) to efficiently solve the resulting nonconvex EDM() problem, leveraging a majorization of the rank-constrained PSD cone and a projection-based operator for convergence guarantees. The authors prove exact recovery under suitable embedding dimension and gamma settings, and demonstrate competitive clustering accuracy and improved scalability on real-world datasets compared to established methods. The approach offers a principled, scalable alternative for convex clustering with strong theoretical guarantees and practical performance benefits.

Abstract

Clustering has been one of the most basic and essential problems in unsupervised learning due to various applications in many critical fields. The recently proposed sum-of-norms (SON) model by Pelckmans et al. (2005), Lindsten et al. (2011) and Hocking et al. (2011) has received a lot of attention. The advantage of the SON model is the theoretical guarantee in terms of perfect recovery, established by Sun et al. (2018). It also provides great opportunities for designing efficient algorithms for solving the SON model. The semismooth Newton based augmented Lagrangian method by Sun et al. (2018) has demonstrated its superior performance over the alternating direction method of multipliers (ADMM) and the alternating minimization algorithm (AMA). In this paper, we propose a Euclidean distance matrix model based on the SON model. An efficient majorization penalty algorithm is proposed to solve the resulting model. Extensive numerical experiments are conducted to demonstrate the efficiency of the proposed model and the majorization penalty algorithm.

Paper Structure

This paper contains 13 sections, 8 theorems, 43 equations, 3 figures, 3 tables, 4 algorithms.

Key Result

Theorem 1

Sun Given input data $A=\left[a_1, a_2, \cdots, a_n\right] \in \mathbb{R}^{d \times n}$ and its partitioning $\mathcal{V}=$$\left\{V_1, V_2, \ldots, V_K\right\}$. Assume that all the clustering centers $\left\{\mathbf{a}^{(1)}\right.$, $\mathbf{a}^{(2)}, \ldots$, $\left.\mathbf{a}^{(K)}\right\}$ are

Figures (3)

  • Figure 1: (a) $\theta\left( t \right)$ in Case 2.1. (b) $\varphi \left( \alpha \right)$ in Case 2.2.
  • Figure 2: RI and NMI of different parameter combinations.
  • Figure 3: The cputime, RI and NMI of different $r$'s.

Theorems & Definitions (18)

  • Theorem 1
  • Remark 1
  • Definition 1
  • Proposition 1
  • Lemma 1
  • Remark 2
  • Theorem 2
  • proof
  • Theorem 3
  • proof
  • ...and 8 more