Table of Contents
Fetching ...

PyClustrPath: An efficient Python package for generating clustering paths with GPU acceleration

Hongfei Wu, Yancheng Yuan

TL;DR

This paper tackles clustering without assuming the number of clusters by generating a clustering path through the convex clustering model across a sequence of γ values, solving $\min_{x_1,\dots,x_n \in R^d} \tfrac{1}{2} \sum_{i=1}^n \|x_i- a_i\|_2^2 + \gamma \sum_{i<j} w_{ij} \|x_i-x_j\|_q$ for each γ. It introduces PyClustrPath, a GPU-accelerated Python package implementing ADMM, fast AMA, and SSNAL within a modular, extensible framework built on Python 3.10+ and PyTorch, leveraging sparse Cholesky via cholespy and PCG for efficiency. The authors demonstrate substantial speedups over CPU solvers across five benchmark datasets, with SSNAL on GPU delivering the best performance, especially for large-scale data such as MNIST. The work provides a scalable tool for convex clustering path computation and outlines future directions for algorithmic improvements and generalizations to related clustering models.

Abstract

Convex clustering is a popular clustering model without requiring the number of clusters as prior knowledge. It can generate a clustering path by continuously solving the model with a sequence of regularization parameter values. This paper introduces {\it PyClustrPath}, a highly efficient Python package for solving the convex clustering model with GPU acceleration. {\it PyClustrPath} implements popular first-order and second-order algorithms with a clean modular design. Such a design makes {\it PyClustrPath} more scalable to incorporate new algorithms for solving the convex clustering model in the future. We extensively test the numerical performance of {\it PyClustrPath} on popular clustering datasets, demonstrating its superior performance compared to the existing solvers for generating the clustering path based on the convex clustering model. The implementation of {\it PyClustrPath} can be found at: https://github.com/D3IntOpt/PyClustrPath.

PyClustrPath: An efficient Python package for generating clustering paths with GPU acceleration

TL;DR

This paper tackles clustering without assuming the number of clusters by generating a clustering path through the convex clustering model across a sequence of γ values, solving for each γ. It introduces PyClustrPath, a GPU-accelerated Python package implementing ADMM, fast AMA, and SSNAL within a modular, extensible framework built on Python 3.10+ and PyTorch, leveraging sparse Cholesky via cholespy and PCG for efficiency. The authors demonstrate substantial speedups over CPU solvers across five benchmark datasets, with SSNAL on GPU delivering the best performance, especially for large-scale data such as MNIST. The work provides a scalable tool for convex clustering path computation and outlines future directions for algorithmic improvements and generalizations to related clustering models.

Abstract

Convex clustering is a popular clustering model without requiring the number of clusters as prior knowledge. It can generate a clustering path by continuously solving the model with a sequence of regularization parameter values. This paper introduces {\it PyClustrPath}, a highly efficient Python package for solving the convex clustering model with GPU acceleration. {\it PyClustrPath} implements popular first-order and second-order algorithms with a clean modular design. Such a design makes {\it PyClustrPath} more scalable to incorporate new algorithms for solving the convex clustering model in the future. We extensively test the numerical performance of {\it PyClustrPath} on popular clustering datasets, demonstrating its superior performance compared to the existing solvers for generating the clustering path based on the convex clustering model. The implementation of {\it PyClustrPath} can be found at: https://github.com/D3IntOpt/PyClustrPath.

Paper Structure

This paper contains 14 sections, 2 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Components, Workflow and Visualization results.
  • Figure 2: Demo codes for LIBRAS-6 dataset.
  • Figure 3: A generated clustering path for the LIBRAS-6 dataset.
  • Figure 4: Demo codes for the COIL-20 dataset.
  • Figure 5: A generated clustering path for the COIL-20 dataset.
  • ...and 5 more figures