Table of Contents
Fetching ...

Doubly Stochastic Adaptive Neighbors Clustering via the Marcus Mapping

Jinghui Yuan, Chusheng Zeng, Fangyuan Xie, Zhe Cao, Mulin Chen, Rong Wang, Feiping Nie, Yuan Yuan

TL;DR

The paper tackles learning sparse, symmetric, doubly stochastic graphs for clustering by extending Marcus theorem through the Marcus mapping, enabling a diagonal-scaling transform that yields a doubly stochastic matrix even for sparse S. It integrates this with adaptive local structure learning under a Laplacian-based rank constraint, producing the ANCMM algorithm that directly yields a graph with exactly $c$ connected components. The authors provide theoretical results (Marcus mapping theorem and its connection to Sinkhorn and OT) and develop an efficient optimization routine with provable convergence, demonstrating superior performance over state-of-the-art baselines on synthetic and real datasets. The work offers a practical, scalable approach to graph-based clustering with exact cluster separation guarantees and a clear link to optimal transport theory, suggesting broad applicability in spectral clustering and related tasks.

Abstract

Clustering is a fundamental task in machine learning and data science, and similarity graph-based clustering is an important approach within this domain. Doubly stochastic symmetric similarity graphs provide numerous benefits for clustering problems and downstream tasks, yet learning such graphs remains a significant challenge. Marcus theorem states that a strictly positive symmetric matrix can be transformed into a doubly stochastic symmetric matrix by diagonal matrices. However, in clustering, learning sparse matrices is crucial for computational efficiency. We extend Marcus theorem by proposing the Marcus mapping, which indicates that certain sparse matrices can also be transformed into doubly stochastic symmetric matrices via diagonal matrices. Additionally, we introduce rank constraints into the clustering problem and propose the Doubly Stochastic Adaptive Neighbors Clustering algorithm based on the Marcus Mapping (ANCMM). This ensures that the learned graph naturally divides into the desired number of clusters. We validate the effectiveness of our algorithm through extensive comparisons with state-of-the-art algorithms. Finally, we explore the relationship between the Marcus mapping and optimal transport. We prove that the Marcus mapping solves a specific type of optimal transport problem and demonstrate that solving this problem through Marcus mapping is more efficient than directly applying optimal transport methods.

Doubly Stochastic Adaptive Neighbors Clustering via the Marcus Mapping

TL;DR

The paper tackles learning sparse, symmetric, doubly stochastic graphs for clustering by extending Marcus theorem through the Marcus mapping, enabling a diagonal-scaling transform that yields a doubly stochastic matrix even for sparse S. It integrates this with adaptive local structure learning under a Laplacian-based rank constraint, producing the ANCMM algorithm that directly yields a graph with exactly connected components. The authors provide theoretical results (Marcus mapping theorem and its connection to Sinkhorn and OT) and develop an efficient optimization routine with provable convergence, demonstrating superior performance over state-of-the-art baselines on synthetic and real datasets. The work offers a practical, scalable approach to graph-based clustering with exact cluster separation guarantees and a clear link to optimal transport theory, suggesting broad applicability in spectral clustering and related tasks.

Abstract

Clustering is a fundamental task in machine learning and data science, and similarity graph-based clustering is an important approach within this domain. Doubly stochastic symmetric similarity graphs provide numerous benefits for clustering problems and downstream tasks, yet learning such graphs remains a significant challenge. Marcus theorem states that a strictly positive symmetric matrix can be transformed into a doubly stochastic symmetric matrix by diagonal matrices. However, in clustering, learning sparse matrices is crucial for computational efficiency. We extend Marcus theorem by proposing the Marcus mapping, which indicates that certain sparse matrices can also be transformed into doubly stochastic symmetric matrices via diagonal matrices. Additionally, we introduce rank constraints into the clustering problem and propose the Doubly Stochastic Adaptive Neighbors Clustering algorithm based on the Marcus Mapping (ANCMM). This ensures that the learned graph naturally divides into the desired number of clusters. We validate the effectiveness of our algorithm through extensive comparisons with state-of-the-art algorithms. Finally, we explore the relationship between the Marcus mapping and optimal transport. We prove that the Marcus mapping solves a specific type of optimal transport problem and demonstrate that solving this problem through Marcus mapping is more efficient than directly applying optimal transport methods.
Paper Structure (31 sections, 32 equations, 3 figures, 5 tables, 2 algorithms)

This paper contains 31 sections, 32 equations, 3 figures, 5 tables, 2 algorithms.

Figures (3)

  • Figure 1: Proof Diagram
  • Figure 2: Visualization on toy data. (a) Original data. (b) The Guassian affinity matrix. (c) Result of CAN method. (d) CAN clustering error samples. (f) Result of ANCMM method. (g) ANCMM clustering error samples.
  • Figure 3: Convergence experimental results on real-world datasets. (a) Ecoli. (b) LetterRecognition. (c) Movement. (d) Wine.