Table of Contents
Fetching ...

Popcorn: Accelerating Kernel K-means on GPUs through Sparse Linear Algebra

Julian Bellavita, Thomas Pasquali, Laura Del Rio Martin, Flavio Vella, Giulia Guidi

TL;DR

Kernel K-means enables non-linear clustering but suffers from $O(n^2 d)$ preprocessing and $O(n^2)$ per-iteration costs, hindering scalability. The paper introduces a matrix-centric formulation that rewrites computations in terms of sparse-dense operations (SpMM/SpMV), enabling an efficient GPU implementation named Popcorn that avoids explicit high-dimensional projections. Popcorn achieves up to $123.8\times$ speedups over CPU implementations and up to $2.6\times$ over a dense GPU baseline, validating sparse linear algebra as a practical tool for high-performance clustering. A dynamic kernel selection strategy for kernel-matrix construction (GEMM vs SYRK) and extensive GPU-centric design underpin these gains, with public availability and plans for distributed extensions.

Abstract

K-means is a popular clustering algorithm with significant applications in numerous scientific and engineering areas. One drawback of K-means is its inability to identify non-linearly separable clusters, which may lead to inaccurate solutions in certain cases. Kernel K-means is a variant of classical K-means that can find non-linearly separable clusters. However, it scales quadratically with respect to the size of the dataset, taking several minutes to cluster even medium-sized datasets on traditional CPU-based machines. In this paper, we present a formulation of Kernel K-means using sparse-dense matrix multiplication (SpMM) and sparse matrix-vector multiplication (SpMV), and we show that our formulation enables the rapid implementation of a fast GPU-based version of Kernel K-means with little programming effort. Our implementation, named Popcorn, is the first open-source GPU-based implementation of Kernel K-means. Popcorn achieves a speedup of up to 123.8x over a CPU implementation of Kernel K-means and a speedup of up to 2.6x over a GPU implementation of Kernel K-means that does not use sparse matrix computations. Our results support the effectiveness of sparse matrices as tools for efficient parallel programming.

Popcorn: Accelerating Kernel K-means on GPUs through Sparse Linear Algebra

TL;DR

Kernel K-means enables non-linear clustering but suffers from preprocessing and per-iteration costs, hindering scalability. The paper introduces a matrix-centric formulation that rewrites computations in terms of sparse-dense operations (SpMM/SpMV), enabling an efficient GPU implementation named Popcorn that avoids explicit high-dimensional projections. Popcorn achieves up to speedups over CPU implementations and up to over a dense GPU baseline, validating sparse linear algebra as a practical tool for high-performance clustering. A dynamic kernel selection strategy for kernel-matrix construction (GEMM vs SYRK) and extensive GPU-centric design underpin these gains, with public availability and plans for distributed extensions.

Abstract

K-means is a popular clustering algorithm with significant applications in numerous scientific and engineering areas. One drawback of K-means is its inability to identify non-linearly separable clusters, which may lead to inaccurate solutions in certain cases. Kernel K-means is a variant of classical K-means that can find non-linearly separable clusters. However, it scales quadratically with respect to the size of the dataset, taking several minutes to cluster even medium-sized datasets on traditional CPU-based machines. In this paper, we present a formulation of Kernel K-means using sparse-dense matrix multiplication (SpMM) and sparse matrix-vector multiplication (SpMV), and we show that our formulation enables the rapid implementation of a fast GPU-based version of Kernel K-means with little programming effort. Our implementation, named Popcorn, is the first open-source GPU-based implementation of Kernel K-means. Popcorn achieves a speedup of up to 123.8x over a CPU implementation of Kernel K-means and a speedup of up to 2.6x over a GPU implementation of Kernel K-means that does not use sparse matrix computations. Our results support the effectiveness of sparse matrices as tools for efficient parallel programming.
Paper Structure (38 sections, 24 equations, 8 figures, 2 tables, 2 algorithms)

This paper contains 38 sections, 24 equations, 8 figures, 2 tables, 2 algorithms.

Figures (8)

  • Figure 1: Computing the diagonal of $\mathbf{VKV^T}$ using SpMV.
  • Figure 2: Comparison of the kernel matrix computation for synthetic data with SYRK and with GEMM.
  • Figure 3: Baseline CUDA implementation speedup over CPU varying $k$.
  • Figure 4: Speedup of Popcorn's pairwise distances algorithm over the baseline CUDA implementation varying $k$.
  • Figure 5: Comparison of throughput between the pairwise distances algorithm of Popcorn and the baseline CUDA implementation for varying $k$.
  • ...and 3 more figures