Data Clustering and Visualization with Recursive Goemans-Williamson MaxCut Algorithm
An Ly, Raj Sawhney, Marina Chugunova
TL;DR
The paper addresses unsupervised clustering of vast biomedical publications using a Goemans-Williamson MaxCut formulation. It introduces a recursive GWA with higher-dimensional relaxations and a vectorization scheme based on conditional probabilities, complemented by PCA-based visualization. A key theoretical result is an approximation bound with $\alpha > 0.878$, and empirical studies on synthetic datasets illustrate improved clustering density and separability, with diminishing returns from higher dimensions. The approach offers a scalable, density-focused framework for organizing biomedical literature with potential benefits for information retrieval and discovery; future work will tackle outliers and parameter tuning.
Abstract
In this article, we introduce a novel recursive modification to the classical Goemans-Williamson MaxCut algorithm, offering improved performance in vectorized data clustering tasks. Focusing on the clustering of medical publications, we employ recursive iterations in conjunction with a dimension relaxation method to significantly enhance density of clustering results. Furthermore, we propose a unique vectorization technique for articles, leveraging conditional probabilities for more effective clustering. Our methods provide advantages in both computational efficiency and clustering accuracy, substantiated through comprehensive experiments.
