Table of Contents
Fetching ...

Data Clustering and Visualization with Recursive Goemans-Williamson MaxCut Algorithm

An Ly, Raj Sawhney, Marina Chugunova

TL;DR

The paper addresses unsupervised clustering of vast biomedical publications using a Goemans-Williamson MaxCut formulation. It introduces a recursive GWA with higher-dimensional relaxations and a vectorization scheme based on conditional probabilities, complemented by PCA-based visualization. A key theoretical result is an approximation bound with $\alpha > 0.878$, and empirical studies on synthetic datasets illustrate improved clustering density and separability, with diminishing returns from higher dimensions. The approach offers a scalable, density-focused framework for organizing biomedical literature with potential benefits for information retrieval and discovery; future work will tackle outliers and parameter tuning.

Abstract

In this article, we introduce a novel recursive modification to the classical Goemans-Williamson MaxCut algorithm, offering improved performance in vectorized data clustering tasks. Focusing on the clustering of medical publications, we employ recursive iterations in conjunction with a dimension relaxation method to significantly enhance density of clustering results. Furthermore, we propose a unique vectorization technique for articles, leveraging conditional probabilities for more effective clustering. Our methods provide advantages in both computational efficiency and clustering accuracy, substantiated through comprehensive experiments.

Data Clustering and Visualization with Recursive Goemans-Williamson MaxCut Algorithm

TL;DR

The paper addresses unsupervised clustering of vast biomedical publications using a Goemans-Williamson MaxCut formulation. It introduces a recursive GWA with higher-dimensional relaxations and a vectorization scheme based on conditional probabilities, complemented by PCA-based visualization. A key theoretical result is an approximation bound with , and empirical studies on synthetic datasets illustrate improved clustering density and separability, with diminishing returns from higher dimensions. The approach offers a scalable, density-focused framework for organizing biomedical literature with potential benefits for information retrieval and discovery; future work will tackle outliers and parameter tuning.

Abstract

In this article, we introduce a novel recursive modification to the classical Goemans-Williamson MaxCut algorithm, offering improved performance in vectorized data clustering tasks. Focusing on the clustering of medical publications, we employ recursive iterations in conjunction with a dimension relaxation method to significantly enhance density of clustering results. Furthermore, we propose a unique vectorization technique for articles, leveraging conditional probabilities for more effective clustering. Our methods provide advantages in both computational efficiency and clustering accuracy, substantiated through comprehensive experiments.
Paper Structure (9 sections, 6 equations, 6 figures)

This paper contains 9 sections, 6 equations, 6 figures.

Figures (6)

  • Figure 1: GWA Results and PCA Visualization over 3 Iterations: Two Cubes
  • Figure 2: GWA Results and PCA Visualization over 3 Iterations: Interlocking Data
  • Figure 3: First Iteration with 104 Dimensions (Left) and 109 Dimensions (Right)
  • Figure 4: Second Iteration with 104 Dimensions (Left) and 109 Dimensions (Right)
  • Figure 5: Third Iteration with 104 Dimensions (Left) and 109 Dimensions (Right)
  • ...and 1 more figures